Variable Order Fractional Gradient Descent¶

Implements Variable Order Fractional Gradient Descent, a fractional-order descent in which the order of the Caputo derivative changes across iterations.

Fractional gradient descent replaces the integer-order gradient with a Caputo fractional derivative of order \(\alpha \in (0,1)\), taken with the previous iterate as the lower terminal. For a fixed terminal \(\theta_{t-1}\) the Caputo derivative of the loss has the closed form \(\frac{1}{\Gamma(2-\alpha)}\,|\theta_t-\theta_{t-1}|^{1-\alpha}\,g_t\), so the effective step is the ordinary gradient \(g_t\) rescaled by a power of the distance moved on the previous step. The factor \(|\theta_t-\theta_{t-1}|^{1-\alpha}\) injects a memory of past motion into each update, and \(\alpha=1\) recovers plain gradient descent.

The paper's contribution is to let the order vary rather than fix it: a fixed small order speeds early progress but stalls near the optimum, so \(\alpha\) is scheduled toward \(1\) as training proceeds, giving a variable-order update \(\alpha_t\).

\[ \begin{aligned} \theta_{t+1} &= \theta_t - \frac{\eta}{\Gamma(2-\alpha_t)}\,\bigl(|\theta_t-\theta_{t-1}|+\delta\bigr)^{1-\alpha_t}\,g_t, \\ \alpha_t &\to 1 \quad\text{as } t \text{ increases.} \end{aligned} \]

where \(\theta\) are the parameters, \(\eta\) the learning rate, \(g_t\) the gradient, \(\alpha_t \in (0,1]\) the variable fractional order at step \(t\), \(\delta>0\) a small constant that keeps the distance term well defined when consecutive iterates coincide, and \(\Gamma\) the gamma function. The lower terminal of the Caputo derivative is taken to be the previous iterate \(\theta_{t-1}\).

Reference: Weipu Lou, Wei Gao, Xianwei Han, Yimin Zhang, "Variable Order Fractional Gradient Descent Method and Its Application in Neural Networks Optimization", 2022 34th Chinese Control and Decision Conference (CCDC) 2022. https://doi.org/10.1109/CCDC55256.2022.10033456

Back to the Canon