AdaGL¶
Implements AdaGL, an Adam-style optimizer that replaces the gradient with a Grünwald–Letnikov fractional-order approximated gradient and modulates the step size by short-term gradient change.
The fractional-order gradient is a truncated Grünwald–Letnikov series over the last ten gradients, injecting long-term memory and global curvature into the moment estimates. A step size control coefficient \(C_t\), built from a scaled-and-shifted softsign of the instantaneous gradient change, adapts the effective learning rate in real time: when the gradient barely changes (likely a flat minimum) the step shrinks toward exploration, and when it changes sharply (likely a sharp minimum or saddle) the step stays large enough to escape. The two pieces are combined in an Adam-style bias-corrected update.
where \(\nabla^{\alpha} L\) is the order-\(\alpha\) Grünwald–Letnikov approximated gradient truncated to 10 terms, \(\Gamma\) is the gamma function, \(\alpha\) is the fractional order, \(g_t\) is the current gradient, \(m_t,v_t\) are the first and second moments of the fractional gradient with decays \(\beta_1,\beta_2\), \(\hat{m}_t,\hat{v}_t\) their bias corrections, \(C_t \in [0.6, 1.1)\) is the step size control coefficient, \(\eta\) is the learning rate, and \(\epsilon\) is a numerical stability constant.
Reference: Shuang Chen, Changlun Zhang, Haibing Mu, "An Adaptive Learning Rate Deep Learning Optimizer Using Long and Short-Term Gradients Based on G–L Fractional-Order Derivative", Neural Processing Letters 2024. https://doi.org/10.1007/s11063-024-11571-7