AccSGD¶
Implements AccSGD, an accelerated stochastic gradient method.
AccSGD couples a short, plain SGD step with a long, momentum-like step and blends the two iterates each update. With \(\eta\) the learning rate, \(\kappa\) the long-to-short step ratio, \(\xi\) the statistical advantage parameter, and a constant \(0 < c \le 1\), the derived coefficients are
where \(\tilde{w}_t\) is the accelerated running iterate, initialized to \(\theta_0\).
Reference: Prateek Jain, Sham M. Kakade, Rahul Kidambi, Praneeth Netrapalli, Aaron Sidford, "Accelerating Stochastic Gradient Descent For Least Squares Regression", COLT 2018. https://arxiv.org/abs/1704.08227 Companion analysis: Rahul Kidambi, Praneeth Netrapalli, Prateek Jain, Sham M. Kakade, "On the insufficiency of existing momentum schemes for Stochastic Optimization", ICLR 2018. https://arxiv.org/abs/1803.05591 Reference implementation: https://github.com/rahulkidambi/AccSGD