Fractional-order SGD (FSGD)¶
Implements Fractional-order SGD (FSGD), gradient descent driven by a Riemann-Liouville fractional-order gradient of each layer.
The paper derives a closed-form fractional-order derivative of an affine layer \(y = xw + b\) and uses it, through a fractional-order autograd, to produce a fractional-order weight gradient. Replacing the ordinary gradient in gradient descent (and its variants, e.g. Adam) with this fractional-order gradient yields the corresponding fractional-order optimizers FSGD and FAdam. For a layer the per-weight fractional gradient is
where \(\alpha \in (0,1]\) is the fractional order, \(\Gamma\) is the gamma function, \(x\) and \(b\) are the layer input and bias, \(w\) a weight, \(\mathbf{G}_t\) the back-propagated upstream matrix, \(\bullet\) the elementwise-then-contracted product of the fractional-derivative matrix with \(\mathbf{G}_t\), \(g_t^{(\alpha)}\) the resulting fractional-order weight gradient, \(\eta\) the learning rate, and \(\theta\) the parameters; at \(\alpha = 1\) the rule reduces to ordinary SGD.
Reference: Xiaojun Zhou, Chunna Zhao, Yaqun Huang, Chengli Zhou, Junjie Ye, Kemeng Xiang, "Fractional-order Jacobian Matrix Differentiation and Its Application in Artificial Neural Networks", arXiv 2025. https://arxiv.org/abs/2506.07408