Stochastic Variance Reduced Primal–Dual Hybrid Gradient Methods for Saddle-Point Problems

Recently, many stochastic Alternating Direction Methods of Multipliers (ADMMs) have been proposed to solve large-scale machine learning problems. However, for large-scale saddle-point problems, the state-of-the-art (SOTA) stochastic ADMMs still have high per-iteration costs. On the other hand, the s...

Full description

Saved in:
Bibliographic Details
Main Authors: Weixin An, Yuanyuan Liu, Fanhua Shang, Hongying Liu
Format: Article
Language:English
Published: MDPI AG 2025-05-01
Series:Mathematics
Subjects:
Online Access:https://www.mdpi.com/2227-7390/13/10/1687
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Recently, many stochastic Alternating Direction Methods of Multipliers (ADMMs) have been proposed to solve large-scale machine learning problems. However, for large-scale saddle-point problems, the state-of-the-art (SOTA) stochastic ADMMs still have high per-iteration costs. On the other hand, the stochastic primal–dual hybrid gradient (SPDHG) has a low per-iteration cost but only a suboptimal convergence rate of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="sans-serif">𝒪</mi><mo stretchy="false">(</mo><mn>1</mn><mo>/</mo><msqrt><mi>S</mi></msqrt><mo stretchy="false">)</mo></mrow></semantics></math></inline-formula>. Thus, there still remains a gap in the convergence rates between SPDHG and SOTA ADMMs. Motivated by the two matters, we propose (accelerated) stochastic variance reduced primal–dual hybrid gradient ((A)SVR-PDHG) methods. We design a linear extrapolation step to improve the convergence rate and a new adaptive epoch length strategy to remove the extra boundedness assumption. Our algorithms have a simpler structure and lower per-iteration complexity than SOTA ADMMs. As a by-product, we present the asynchronous parallel variants of our algorithms. In theory, we rigorously prove that our methods converge linearly for strongly convex problems and improve the convergence rate to <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="sans-serif">𝒪</mi><mo stretchy="false">(</mo><mn>1</mn><mo>/</mo><msup><mi>S</mi><mn>2</mn></msup><mo stretchy="false">)</mo></mrow></semantics></math></inline-formula> for non-strongly convex problems as opposed to the existing <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mi mathvariant="sans-serif">𝒪</mi><mo stretchy="false">(</mo><mn>1</mn><mo>/</mo><mi>S</mi><mo stretchy="false">)</mo></mrow></semantics></math></inline-formula> convergence rate. Compared with SOTA algorithms, various experimental results demonstrate that ASVR-PDHG can achieve an average speedup of <inline-formula><math xmlns="http://www.w3.org/1998/Math/MathML" display="inline"><semantics><mrow><mn>2</mn><mo>×</mo><mo>∼</mo><mn>5</mn><mo>×</mo></mrow></semantics></math></inline-formula>.
ISSN:2227-7390