Stochastic Variance Reduced Primal–Dual Hybrid Gradient Methods for Saddle-Point Problems
Recently, many stochastic Alternating Direction Methods of Multipliers (ADMMs) have been proposed to solve large-scale machine learning problems. However, for large-scale saddle-point problems, the state-of-the-art (SOTA) stochastic ADMMs still have high per-iteration costs. On the other hand, the s...
Saved in:
| Main Authors: | , , , |
|---|---|
| Format: | Article |
| Language: | English |
| Published: |
MDPI AG
2025-05-01
|
| Series: | Mathematics |
| Subjects: | |
| Online Access: | https://www.mdpi.com/2227-7390/13/10/1687 |
| Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
| Summary: | Recently, many stochastic Alternating Direction Methods of Multipliers (ADMMs) have been proposed to solve large-scale machine learning problems. However, for large-scale saddle-point problems, the state-of-the-art (SOTA) stochastic ADMMs still have high per-iteration costs. On the other hand, the stochastic primal–dual hybrid gradient (SPDHG) has a low per-iteration cost but only a suboptimal convergence rate of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="sans-serif">𝒪</mi><mo stretchy="false">(</mo><mn>1</mn><mo>/</mo><msqrt><mi>S</mi></msqrt><mo stretchy="false">)</mo></mrow></semantics></math></inline-formula>. Thus, there still remains a gap in the convergence rates between SPDHG and SOTA ADMMs. Motivated by the two matters, we propose (accelerated) stochastic variance reduced primal–dual hybrid gradient ((A)SVR-PDHG) methods. We design a linear extrapolation step to improve the convergence rate and a new adaptive epoch length strategy to remove the extra boundedness assumption. Our algorithms have a simpler structure and lower per-iteration complexity than SOTA ADMMs. As a by-product, we present the asynchronous parallel variants of our algorithms. In theory, we rigorously prove that our methods converge linearly for strongly convex problems and improve the convergence rate to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="sans-serif">𝒪</mi><mo stretchy="false">(</mo><mn>1</mn><mo>/</mo><msup><mi>S</mi><mn>2</mn></msup><mo stretchy="false">)</mo></mrow></semantics></math></inline-formula> for non-strongly convex problems as opposed to the existing <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="sans-serif">𝒪</mi><mo stretchy="false">(</mo><mn>1</mn><mo>/</mo><mi>S</mi><mo stretchy="false">)</mo></mrow></semantics></math></inline-formula> convergence rate. Compared with SOTA algorithms, various experimental results demonstrate that ASVR-PDHG can achieve an average speedup of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>2</mn><mo>×</mo><mo>∼</mo><mn>5</mn><mo>×</mo></mrow></semantics></math></inline-formula>. |
|---|---|
| ISSN: | 2227-7390 |