FOAdam¶

Implements FOAdam, a fractional-order Adam that replaces the integer-order gradient with a Caputo fractional gradient.

The method generalizes Adam by computing the loss gradient through the Caputo fractional derivative of order \(\alpha \in (0,1]\), so that each step carries a tunable memory of the optimization trajectory. The fractional gradient \(g_t^{(\alpha)}\) is then fed into the usual Adam machinery — exponential first and second moment estimates with bias correction. A fractional-order scheduler (built on a connections cloud model) adapts \(\alpha\) during training to trade convergence speed against precision; with \(\alpha = 1\) the Caputo gradient reduces to the ordinary gradient and FOAdam recovers Adam.

\[ \begin{aligned} g_t^{(\alpha)} &= \frac{1}{\Gamma(1-\alpha)} \int_{c}^{\theta_{t}} (\theta_t - \tau)^{-\alpha}\, \nabla f(\tau)\, d\tau \\ m_t &= \beta_1 m_{t-1} + (1-\beta_1)\, g_t^{(\alpha)} \\ v_t &= \beta_2 v_{t-1} + (1-\beta_2)\, \left(g_t^{(\alpha)}\right)^2 \\ \hat{m}_t &= \frac{m_t}{1-\beta_1^{t}}, \qquad \hat{v}_t = \frac{v_t}{1-\beta_2^{t}} \\ \theta_{t+1} &= \theta_t - \eta\, \frac{\hat{m}_t}{\sqrt{\hat{v}_t} + \epsilon} \end{aligned} \]

where \(\theta\) are the parameters, \(\eta\) the learning rate, \(\nabla f\) the ordinary loss gradient, \(g_t^{(\alpha)}\) the Caputo fractional gradient of order \(\alpha\) with lower terminal \(c\), \(\Gamma\) the gamma function, \(m_t, v_t\) the first and second moment estimates with decay rates \(\beta_1, \beta_2\), and \(\epsilon\) a stability constant.

Reference: Guangyao Chen, Yangze Liang, Sihao Li, Zhao Xu, "A novel gradient descent optimizer based on fractional order scheduler and its application in deep neural networks", Applied Mathematical Modelling 2024. https://doi.org/10.1016/j.apm.2023.12.018

Back to the Canon