FO-Elman¶

Implements FO-Elman, a Grünwald–Letnikov fractional-order gradient descent rule for training Elman recurrent neural networks.

Conventional integer-order gradient descent for Elman networks is prone to local minima and slow convergence. This method replaces the integer derivative in backpropagation with a fractional derivative of order \(\alpha\) defined through the Grünwald–Letnikov scheme, which expands the gradient into a weighted aggregation of historical gradient information. The fractional weights inject memory of past updates into each step, smoothing the descent direction and improving convergence on the recurrent context units of the Elman architecture.

\[ \begin{aligned} \nabla^{\alpha} L(\theta_{t}) &= \sum_{j=0}^{N} (-1)^{j}\, \frac{\Gamma(\alpha+1)}{\Gamma(j+1)\,\Gamma(\alpha-j+1)}\, g_{t-j} \\ \theta_{t+1} &= \theta_{t} - \eta\, \nabla^{\alpha} L(\theta_{t}) \end{aligned} \]

where \(\theta_t\) are the network weights (input, recurrent, and output layers), \(g_{t-j} = \nabla L(\theta_{t-j})\) are the integer-order gradients at past iterations, \(\nabla^{\alpha} L\) is the order-\(\alpha\) Grünwald–Letnikov fractional gradient truncated to \(N\) historical terms, \(\Gamma(\cdot)\) is the gamma function, \(\alpha \in (0,1)\) is the fractional order, and \(\eta\) is the learning rate.

Reference: He Li, Shanze Wang, Yangquan Chen, Yanmei Liu, "Fractional-order gradient descent learning for Elman neural networks", Neural Networks 2026. https://doi.org/10.1016/j.neunet.2026.108880

Back to the Canon