the Improved Stochastic Fractional Order Gradient Descent algorithm¶

Implements the Improved Stochastic Fractional Order Gradient Descent algorithm, SGD whose step direction is a Caputo fractional-order gradient.

The ordinary gradient is replaced by a fractional-order gradient that injects memory of the trajectory: the Caputo derivative of the loss is discretized using the previous iterate, so the effective step depends on the displacement \(|\theta_{t+1}-\theta_t|\) raised to the power \(1-\alpha\). A small offset \(\delta>0\) is added inside the displacement to avoid singularity when two consecutive iterates coincide. The fractional order \(\alpha\) tunes the memory and reshapes the convergence and monotonicity of the update; the paper proves a sublinear regret bound for the resulting online algorithm and also gives adaptive-gradient and momentum variants of the same fractional step.

\[ \theta_{t+2} = \theta_{t+1} - \frac{\mu_t}{\Gamma(2-\alpha)\,\bigl(|\theta_{t+1}-\theta_t|+\delta\bigr)^{1-\alpha}}\, g_{t+1} \]

where \(\theta\) are the parameters, \(\mu_t\) is the learning rate (decayed as \(\mu_t=\mu_0/t^{p}\)), \(g_{t+1}=\nabla f(\theta_{t+1})\) is the ordinary gradient, \(\alpha\in(0,2)\) is the fractional order, \(\delta>0\) is a stability offset, and \(\Gamma\) is the gamma function.

Reference: Yang Yang, Lipo Mo, Yusen Hu, Fei Long, "The Improved Stochastic Fractional Order Gradient Descent Algorithm", Fractal and Fractional 2023. https://doi.org/10.3390/fractalfract7080631

Back to the Canon