AdamNX¶
Implements AdamNX, an Adam variant with a novel time-dependent exponential decay rate for the second-moment estimate.
AdamNX folds bias correction directly into the moment-update coefficients rather than applying a separate \(\hat{m}_t, \hat{v}_t\) step. For the second moment it replaces Adam's fixed \(\beta_2\) with an effective decay \(\hat{\beta}_{2,t}\) that increases toward \(1\) as training progresses, gradually weakening the per-coordinate step-size correction so the optimizer behaves more like momentum SGD in late training. Weight decay is decoupled and applied directly to the parameters.
where \(\theta\) are the parameters, \(\eta\) the learning rate, \(g_t\) the gradient, \(m_t\) and \(v_t\) the first- and second-moment estimates, \(\beta_1, \beta_2\) the base decay rates, \(\hat{\beta}_{2,t}\) the effective time-varying second-moment decay, \(\lambda\) the decoupled weight decay, and \(\epsilon\) a stability constant (defaults \(\beta_1 = 0.9\), \(\beta_2 = 0.99\), \(\epsilon = 10^{-8}\)).
Reference: Meng Zhu, Quan Xiao, Weidong Min, "AdamNX: An Adam improvement algorithm based on a novel exponential decay mechanism for the second-order moment estimate", arXiv 2025. https://arxiv.org/abs/2511.13465