AdamMC¶
Implements AdamMC, Adam with moment centralization.
AdamMC augments Adam with a centralization step on the first-order moment: before bias correction, the mean of the accumulated momentum is subtracted from it. Computed per layer over the momentum tensor, this enforces a zero-mean constraint on the momentum, which the authors find improves generalization for convolutional networks. The rest of the update is identical to Adam.
where \(\theta\) are the parameters, \(\eta\) the learning rate, \(g_t\) the gradient, \(m_t\) and \(v_t\) the first- and second-order moments, \(\beta_1,\beta_2\) the decay rates, \(\mathrm{mean}(m_t)\) the mean of the momentum tensor (taken per layer), and \(\epsilon\) a stability constant.
Reference: Sumanth Sadu, Shiv Ram Dubey, S. R. Sreeja, "Moment Centralization based Gradient Descent Optimizers for Convolutional Neural Networks", arXiv 2022. https://arxiv.org/abs/2207.09066