Fractional Order Stochastic Gradient Descent (FOSGD)¶

Implements Fractional Order Stochastic Gradient Descent (FOSGD), an SGD variant that scales each step by a fractional-order memory term.

FOSGD replaces the integer-order gradient with a Caputo fractional derivative of order \(\alpha \in (0, 1]\). Approximating that derivative by its leading Taylor term yields an update that multiplies the stochastic gradient by a power of the previous step size, so the optimizer carries history-dependent memory of the trajectory. A stabilizer \(\delta\) keeps the term positive when consecutive iterates coincide, and the case \(\alpha = 1\) (with \(\delta \to 0\)) recovers ordinary SGD. The paper studies how the optimal \(\alpha\) shifts with the tail index of long-tailed data.

\[ \begin{aligned} \theta_{t+1} &= \theta_t - \eta\, \frac{g_t}{\Gamma(2-\alpha)}\, \left( |\theta_t - \theta_{t-1}| + \delta \right)^{1-\alpha}, \qquad 0 < \alpha \le 1. \end{aligned} \]

where \(\theta\) are the parameters, \(\eta\) the learning rate, \(g_t = \nabla F(\theta_t, \xi_t)\) the stochastic minibatch gradient, \(\alpha\) the fractional order, \(\delta > 0\) a small stabilizer (typically \(10^{-6}\)), \(\Gamma(\cdot)\) the Gamma function, and \(|\theta_t - \theta_{t-1}|\) the elementwise magnitude of the previous step.

Reference: Mohammad Partohaghighi, Roummel Marcia, YangQuan Chen, "Tail-Index-Awareness in Fractional Order Stochastic Gradient Descent", ASME IDETC-CIE 2025. https://doi.org/10.1115/DETC2025-169054

Back to the Canon