IAdaPID-ADG¶
Implements IAdaPID-ADG, an improved adaptive PID optimizer combining AMSGrad and DiffGrad.
IAdaPID-ADG casts optimization as PID control: the integral term \(I_t\) accumulates past gradients while the derivative term \(D_t\) tracks gradient changes, with no explicit proportional term. The "ADG" component (AMSDiffGrad) fixes Adam-style non-convergence by tracking running maxima of the second moments (AMSGrad), and stabilizes steps through a DiffGrad sigmoid factor \(\mu_t\) that shrinks the effective step when consecutive gradients differ sharply.
where \(\theta\) are the parameters, \(\eta\) the learning rate, \(g_t\) the gradient, \(\Delta g_t\) the gradient difference, \(\mu_t\) the DiffGrad modulation factor, \(I_t\) and \(D_t\) the integral and derivative terms, \(v_t\) and \(d_t\) second-moment estimates of \(g_t\) and \(\Delta g_t\), \(v_t^{\max}\) and \(d_t^{\max}\) their running maxima, \(\gamma\) and \(\beta\) decay rates, \(K_i\) and \(K_d\) the integral and derivative gains, and \(\epsilon\) a stability constant.
Reference: Saurabh Saini, Kapil Ahuja, Thomas Wick, Saurav Kumar, "An Improved Adaptive PID Optimizer with Enhanced Convergence and Stability for Deep Learning", arXiv 2026. https://arxiv.org/abs/2605.21968