Caputo Fractional Gradient Descent¶

Implements Caputo Fractional Gradient Descent, a fractional-order optimizer that replaces the integer-order gradient with a Caputo fractional derivative of the loss.

Standard gradient descent uses the first-order (integer) derivative of the loss with respect to each weight, a local quantity. This method instead computes the gradient through the Caputo fractional derivative of order \(\alpha\) taken relative to a fixed base point \(c\), so the update carries a non-local, memory-dependent term that integrates the loss landscape between \(c\) and the current weight. Applied to product-unit neural networks, whose neurons multiply inputs raised to learnable powers, the fractional gradient is reported to give more stable convergence and stronger robustness to noise in function approximation. The order \(\alpha \in (0,1]\) interpolates between fractional behavior and ordinary gradient descent, recovered at \(\alpha = 1\).

\[ \begin{aligned} {}^{C}_{c}D^{\alpha}_{\theta}\, L(\theta) &= \frac{1}{\Gamma(1-\alpha)} \int_{c}^{\theta} \frac{L'(\tau)}{(\theta - \tau)^{\alpha}}\, d\tau, \qquad 0 < \alpha < 1, \\ \theta_{t+1} &= \theta_t - \eta \, {}^{C}_{c}D^{\alpha}_{\theta}\, L(\theta_t). \end{aligned} \]

where \(\theta\) are the weights, \(\eta\) the learning rate, \(L\) the loss, \({}^{C}_{c}D^{\alpha}_{\theta}\) the Caputo fractional derivative of order \(\alpha\) with respect to \(\theta\) about base point \(c\), \(L'\) the ordinary first derivative of \(L\), and \(\Gamma\) the gamma function; at \(\alpha = 1\) the operator reduces to the ordinary gradient and the rule becomes standard gradient descent.

Reference: "Fractional Order Gradient Descent with Caputo Derivatives for Product-Unit Neural Networks", ICAACE 2025. https://doi.org/10.1109/ICAACE65325.2025.11020545

Back to the Canon