Gravity¶
Implements Gravity, a kinematic optimizer.
Gravity treats each parameter as a point mass rolling down an inclined plane whose slope is the gradient, and integrates a constant-acceleration kinematic step. The per-coordinate step is largest for moderate gradients and saturates as the gradient grows, giving a bounded velocity increment. The velocity buffer is seeded from a normal distribution and smoothed by a running average whose decay anneals from \(\frac{1}{2}\) toward \(\beta\) as training proceeds.
where \(g_t\) is the gradient, \(m_t\) the reciprocal of the largest gradient magnitude, \(\zeta_t\) the saturating gravity step, \(V_t\) the velocity buffer, \(\eta\) the learning rate, \(\alpha\) the velocity initialization scale, and \(\beta\) the asymptotic running-average decay.
Reference: Dariush Bahrami, Sadegh Pouriyan Zadeh, "Gravity Optimizer: a Kinematic Approach on Optimization in Deep Learning", arXiv 2021. https://arxiv.org/abs/2101.09192