pbSGD¶
Implements pbSGD, stochastic gradient descent with the Powerball transform applied elementwise to the gradient.
pbSGD raises each gradient component to a fixed power \(\gamma \in (0, 1]\) while preserving its sign, a nonlinear reshaping called the Powerball function. Powers below one amplify small-magnitude gradients and compress large ones, which speeds up early training and improves robustness to vanishing gradients; at \(\gamma = 1\) the method reduces to ordinary SGD. The momentum variant pbSGDM accumulates the transformed gradient in a velocity buffer before the step.
where \(\theta\) are the parameters, \(\eta\) the learning rate, \(g_t\) the gradient, \(\gamma \in (0,1]\) the power exponent, \(\mathrm{sign}\) and \(\lvert \cdot \rvert\) act elementwise, \(\beta\) the momentum factor (\(\beta = 0\) recovers plain pbSGD, \(\beta > 0\) gives pbSGDM), and \(m_t\) the momentum buffer.
Reference: Beitong Zhou, Jun Liu, Weigao Sun, Ruijuan Chen, Claire Tomlin, Ye Yuan, "pbSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization", IJCAI 2020. https://www.ijcai.org/proceedings/2020/451