Fractional-Order Momentum (FCM)¶
Implements Fractional-Order Momentum (FCM), stochastic classical momentum with the integer-order difference of the velocity replaced by a Grünwald–Letnikov fractional-order difference.
Classical momentum updates the velocity through the first-order difference \(v_t - v_{t-1}\). FCM generalizes this step by applying the Grünwald–Letnikov fractional-order difference of order \(\alpha \in (0,1)\), which mixes a short memory window of past velocities into the current update and gives the optimizer extra flexibility to escape sharp local minima. The fractional order is scheduled (linearly or nonlinearly) during training, and the same construction applied to Adam yields a fractional-order adaptive variant.
where \(g_t = \nabla_\theta f(\theta_t)\) is the stochastic gradient, \(v_t\) is the velocity, \(\eta\) is the learning rate, \(\beta_1\) is the momentum coefficient, \(\alpha \in (0,1)\) is the fractional order, \(\psi(\alpha, j)\) are the Grünwald–Letnikov coefficients (equal to \((-1)^j \binom{\alpha}{j}\)), \(\Gamma\) is the Gamma function, and \(K\) is the short-memory truncation length (about 10 terms). Setting \(\alpha = 1\) recovers stochastic classical momentum.
Reference: Tao Kan, Zhe Gao, Chuang Yang, Jing Jian, "Convolutional neural networks based on fractional-order momentum for parameter training", Neurocomputing 449 (2021) 85–99. https://doi.org/10.1016/j.neucom.2021.03.075