Caputo fractional-order gradient descent¶

Implements Caputo fractional-order gradient descent, training BP neural networks by replacing the integer-order error gradient with its Caputo fractional derivative.

The quadratic error is the usual \(E = \tfrac{1}{2}\sum_j \lVert A_j - O_j\rVert^2\). Instead of the ordinary first derivative, each weight is updated along the Caputo fractional derivative of \(E\) of order \(\alpha \in (0,1)\). Because a fixed lower terminal would make the method converge to a point with nonzero true gradient, the terminal is taken at the previous iterate; the Caputo derivative of the power-law \((\theta - \theta_{t-1})\) then contributes a factor \(\lvert \theta_t - \theta_{t-1}\rvert^{1-\alpha}/\Gamma(2-\alpha)\), which multiplies the ordinary gradient computed by backpropagation. As \(\alpha \to 1\) the factor tends to \(1\) and the rule reduces to standard gradient descent.

\[ \begin{aligned} g_t &= \nabla_\theta E(\theta_t) \\ {}^{C}\!D^{\alpha}_{\theta} E &= \frac{\lvert \theta_t - \theta_{t-1}\rvert^{1-\alpha}}{\Gamma(2-\alpha)}\, g_t \\ \theta_{t+1} &= \theta_t - \eta\, {}^{C}\!D^{\alpha}_{\theta} E \end{aligned} \]

where \(\theta\) are the weights, \(\eta\) the learning rate, \(g_t\) the ordinary gradient of the quadratic error \(E\) from backpropagation, \(\alpha \in (0,1)\) the fractional order, \(\theta_{t-1}\) the previous iterate serving as the Caputo terminal, and \(\Gamma\) the gamma function.

Reference: Jian Wang, Yanqing Wen, Yida Gou, Zhenyun Ye, Hua Chen, "Fractional-order gradient descent learning of BP neural networks with Caputo derivative", Neural Networks 2017. https://doi.org/10.1016/j.neunet.2017.02.007

Back to the Canon