Caputo Fractional-order Gradient Descent for Ridge Polynomial Neural¶

Implements Caputo Fractional-order Gradient Descent for Ridge Polynomial Neural Networks, training the network with a Caputo fractional gradient instead of the integer-order one.

A Ridge Polynomial Neural Network is trained by replacing the ordinary gradient in the weight update with a Caputo fractional-order gradient of order \(\alpha \in (0,1)\). Taking the truncated Caputo derivative of the loss with the previous iterate as the lower terminal, the fractional gradient of the within-layer weights reduces to the integer-order gradient scaled by a power of the most recent step, \(\|\theta_t - \theta_{t-1}\|^{1-\alpha}/\Gamma(2-\alpha)\). The exponent \(1-\alpha\) injects a tunable memory of the optimization trajectory, which the authors report improves accuracy and generalization over integer-order gradient descent on Ridge Polynomial networks.

\[ \begin{aligned} g_t &= \nabla_\theta L(\theta_t), \\ \theta_{t+1} &= \theta_t - \frac{\eta}{\Gamma(2-\alpha)} \, \big(\|\theta_t - \theta_{t-1}\| + \delta\big)^{1-\alpha} \, g_t. \end{aligned} \]

where \(\theta\) are the network weights, \(\eta\) is the learning rate, \(g_t\) is the loss gradient, \(\alpha \in (0,1)\) is the fractional order, \(\Gamma\) is the gamma function, \(\delta\) is a small positive constant guarding the memory term, and \(\|\theta_t - \theta_{t-1}\|\) is the size of the previous step (the Caputo memory term); \(\alpha \to 1\) recovers ordinary gradient descent.

Reference: Zeyong Wu, Yan Lv, Yan Liu, "A Novel Method for Ridge Polynomial Neural Network-based Caputo Fractional-order Gradient Descent Algorithm", 2025 7th International Conference on Electronics and Communication, Network and Computer Technology (ECNCT) 2025. https://doi.org/10.1109/ECNCT66493.2025.11172593

Back to the Canon