Fractional Order Gradient Descent with Momentum (FOGDM)¶
Implements Fractional Order Gradient Descent with Momentum (FOGDM), gradient descent with momentum whose search direction is a Caputo fractional-order gradient.
The Caputo fractional gradient of the quadratic energy is truncated to its leading term, scaling the ordinary gradient by \(|\theta_t - \theta_{t-1} + \epsilon|^{1-\alpha}/\Gamma(2-\alpha)\), where the lower terminal is taken as the previous iterate so the method tracks the real extreme point as \(\theta_t \to \theta_{t-1}\). A classical momentum term carrying the previous step is added on top to damp the oscillation of plain fractional gradient descent and to speed convergence; an adaptive learning rate adjusts \(\eta\) during training. As \(\alpha \to 1\) the fractional factor tends to one and the rule reduces to gradient descent with momentum.
where \(\theta\) are the network weights, \(E\) the quadratic error, \(\eta > 0\) the (adaptive) learning rate, \(\mu \in [0,1)\) the momentum coefficient, \(\alpha \in (0,1)\) the fractional order, \(v_t\) the velocity, \(\epsilon \ge 0\) a small constant guarding the singularity at \(\theta_t = \theta_{t-1}\), and \(\Gamma(\cdot)\) the Gamma function.
Reference: Han Xue, Zheping Shao, Hongbo Sun, "Data classification based on fractional order gradient descent with momentum for RBF neural network", Network: Computation in Neural Systems 31(1-4), 2020. https://doi.org/10.1080/0954898X.2020.1849842