Conformable Fractional Gradient Descent¶

Implements Conformable Fractional Gradient Descent, gradient descent built on the conformable fractional derivative of the loss.

The conformable fractional derivative of order \(\alpha\) is local: it rescales the ordinary derivative by \(t^{1-\alpha}\), i.e. \(D_\alpha f(t) = t^{1-\alpha} f'(t)\). Applied to backpropagation, this replaces each classical gradient with its conformable fractional counterpart, so the partial derivative of the loss with respect to a weight is multiplied by that weight raised to the power \(1-\alpha\). The factor avoids the history-dependent cost of Caputo or Riemann-Liouville fractional gradients while reshaping the effective step per parameter. An optional \(L_2\) term contributes a \(\theta^{2-\alpha}\) decay under the same conformable rescaling.

\[ \begin{aligned} g_t &= \frac{\partial E}{\partial \theta_t} \\ \theta_{t+1} &= \theta_t - \eta \, \theta_t^{\,1-\alpha} \, g_t \\ \theta_{t+1} &= \theta_t - \eta \left( \theta_t^{\,1-\alpha} \, g_t - \lambda \, \theta_t^{\,2-\alpha} \right) \quad (\text{with } L_2) \end{aligned} \]

where \(\theta\) is a weight, \(\eta\) the learning rate, \(g_t\) the ordinary gradient of the loss \(E\), \(\alpha \in (0,1)\) the conformable fractional order, and \(\lambda\) the \(L_2\) regularization coefficient. Setting \(\alpha = 1\) recovers standard gradient descent; \(\alpha = 0.5\) gives the paper's fractional Newton-type step \(\theta_{t+1} = \theta_t - \sqrt{\theta_t}\, g_t\).

Reference: Mohammad Rushdi Saleh, Basem Ajarmah, "Fractional Gradient Descent Learning of Backpropagation Artificial Neural Networks with Conformable Fractional Calculus", Frontiers in Artificial Intelligence and Applications, Vol. 358 (FSDM 2022). https://doi.org/10.3233/FAIA220372

Back to the Canon