the Improved Stochastic Fractional Order Gradient Descent algorithm¶
Implements the Improved Stochastic Fractional Order Gradient Descent algorithm, SGD whose step direction is a Caputo fractional-order gradient.
The ordinary gradient is replaced by a fractional-order gradient that injects memory of the trajectory: the Caputo derivative of the loss is discretized using the previous iterate, so the effective step depends on the displacement \(|\theta_{t+1}-\theta_t|\) raised to the power \(1-\alpha\). A small offset \(\delta>0\) is added inside the displacement to avoid singularity when two consecutive iterates coincide. The fractional order \(\alpha\) tunes the memory and reshapes the convergence and monotonicity of the update; the paper proves a sublinear regret bound for the resulting online algorithm and also gives adaptive-gradient and momentum variants of the same fractional step.
where \(\theta\) are the parameters, \(\mu_t\) is the learning rate (decayed as \(\mu_t=\mu_0/t^{p}\)), \(g_{t+1}=\nabla f(\theta_{t+1})\) is the ordinary gradient, \(\alpha\in(0,2)\) is the fractional order, \(\delta>0\) is a stability offset, and \(\Gamma\) is the gamma function.
Reference: Yang Yang, Lipo Mo, Yusen Hu, Fei Long, "The Improved Stochastic Fractional Order Gradient Descent Algorithm", Fractal and Fractional 2023. https://doi.org/10.3390/fractalfract7080631