Fractional Order Gradient Descent with variable initial value¶

Implements Fractional Order Gradient Descent with variable initial value, a Riemann-Liouville fractional gradient method tuned for convergence to the true extremum.

Classical first-order gradient descent replaces the integer derivative with a Riemann-Liouville fractional derivative to exploit its long-memory behavior and speed up convergence. Naively, this biases the stationary point away from the real extremum because the fractional gradient depends on the lower terminal \(c\) (the initial value) and the order \(\alpha\). The method keeps only the dominant first-order term of the fractional derivative and, critically, treats the lower terminal as a variable initial value \(c_t\) (selected per problem, here via random-weight particle swarm optimization) so the iterates converge to the actual minimum rather than a fractional-shifted point.

\[ \begin{aligned} \theta_{t+1} &= \theta_t - \frac{\eta}{\Gamma(2-\alpha)}\,\big(\lvert \theta_t - c_t\rvert + \epsilon\big)^{1-\alpha}\, g_t \end{aligned} \]

where \(g_t = \nabla f(\theta_t)\) is the (integer-order) gradient, \(\eta>0\) is the learning rate, \(0<\alpha<1\) is the fractional order, \(\Gamma(\cdot)\) is the gamma function, \(c_t\) is the variable initial value (lower integral terminal) chosen so that convergence reaches the real extremum, and \(\epsilon\) is a small constant preventing a zero denominator when \(\theta_t = c_t\). As \(\alpha \to 1\) the rule degenerates to ordinary gradient descent.

Reference: Yong Wang, Yuli He, Zhiguang Zhu, "Study on fast speed fractional order gradient descent method and its application in neural networks", Neurocomputing 2022. https://doi.org/10.1016/j.neucom.2022.02.034

Back to the Canon