Fractional Gradient Descent (FGD)¶

Implements Fractional Gradient Descent (FGD), gradient descent in which the integer-order gradient is replaced by a fractional-order derivative of the objective.

This entry is a survey of the FGD family rather than a single proprietary optimizer. The unifying idea across the surveyed methods is to substitute the classical gradient \(\nabla f\) with a fractional derivative of order \(\alpha\) (Caputo, Riemann–Liouville, or related definitions), so the update carries the non-locality and long-memory of fractional calculus. The reviewed variants differ in how they tame this memory: modified fractional-order gradients that avoid singularities and converge to the true extremum, truncation to suppress the oscillatory tail, adaptive step sizes, and variable fractional-order schedules. The canonical FGD step the survey builds on is:

\[ \theta_{t+1} = \theta_t - \eta_t \, D^{\alpha}_{c}\, f(\theta_t) \]

where \(\theta_t\) are the parameters at step \(t\), \(\eta_t\) is the (possibly adaptive) step size, \(f\) is the objective, \(\alpha\) is the fractional order, and \(D^{\alpha}_{c}\) denotes the fractional derivative of order \(\alpha\) taken with respect to \(\theta\) about a base point \(c\) (recovering ordinary gradient descent as \(\alpha \to 1\)).

Reference: Sroor M. Elnady, Mohamed El-Beltagy, Ahmed G. Radwan, Mohammed E. Fouda, "A comprehensive survey of fractional gradient descent methods and their convergence analysis", Chaos, Solitons & Fractals 2025. https://doi.org/10.1016/j.chaos.2025.116154

Back to the Canon