Fractional Order Gradient Descent with Momentum (FOGDM)¶

Implements Fractional Order Gradient Descent with Momentum (FOGDM), gradient descent with momentum whose search direction is a Caputo fractional-order gradient.

The Caputo fractional gradient of the quadratic energy is truncated to its leading term, scaling the ordinary gradient by \(|\theta_t - \theta_{t-1} + \epsilon|^{1-\alpha}/\Gamma(2-\alpha)\), where the lower terminal is taken as the previous iterate so the method tracks the real extreme point as \(\theta_t \to \theta_{t-1}\). A classical momentum term carrying the previous step is added on top to damp the oscillation of plain fractional gradient descent and to speed convergence; an adaptive learning rate adjusts \(\eta\) during training. As \(\alpha \to 1\) the fractional factor tends to one and the rule reduces to gradient descent with momentum.

\[ \begin{aligned} g^{\alpha}_t &= \frac{\nabla_\theta E(\theta_t)}{\Gamma(2-\alpha)} \left( |\theta_t - \theta_{t-1}| + \epsilon \right)^{1-\alpha}, \\ v_t &= \mu \, v_{t-1} - \eta \, g^{\alpha}_t, \\ \theta_{t+1} &= \theta_t + v_t. \end{aligned} \]

where \(\theta\) are the network weights, \(E\) the quadratic error, \(\eta > 0\) the (adaptive) learning rate, \(\mu \in [0,1)\) the momentum coefficient, \(\alpha \in (0,1)\) the fractional order, \(v_t\) the velocity, \(\epsilon \ge 0\) a small constant guarding the singularity at \(\theta_t = \theta_{t-1}\), and \(\Gamma(\cdot)\) the Gamma function.

Reference: Han Xue, Zheping Shao, Hongbo Sun, "Data classification based on fractional order gradient descent with momentum for RBF neural network", Network: Computation in Neural Systems 31(1-4), 2020. https://doi.org/10.1080/0954898X.2020.1849842

Back to the Canon