Caputo fractional-order gradient descent¶
Implements Caputo fractional-order gradient descent, training BP neural networks by replacing the integer-order error gradient with its Caputo fractional derivative.
The quadratic error is the usual \(E = \tfrac{1}{2}\sum_j \lVert A_j - O_j\rVert^2\). Instead of the ordinary first derivative, each weight is updated along the Caputo fractional derivative of \(E\) of order \(\alpha \in (0,1)\). Because a fixed lower terminal would make the method converge to a point with nonzero true gradient, the terminal is taken at the previous iterate; the Caputo derivative of the power-law \((\theta - \theta_{t-1})\) then contributes a factor \(\lvert \theta_t - \theta_{t-1}\rvert^{1-\alpha}/\Gamma(2-\alpha)\), which multiplies the ordinary gradient computed by backpropagation. As \(\alpha \to 1\) the factor tends to \(1\) and the rule reduces to standard gradient descent.
where \(\theta\) are the weights, \(\eta\) the learning rate, \(g_t\) the ordinary gradient of the quadratic error \(E\) from backpropagation, \(\alpha \in (0,1)\) the fractional order, \(\theta_{t-1}\) the previous iterate serving as the Caputo terminal, and \(\Gamma\) the gamma function.
Reference: Jian Wang, Yanqing Wen, Yida Gou, Zhenyun Ye, Hua Chen, "Fractional-order gradient descent learning of BP neural networks with Caputo derivative", Neural Networks 2017. https://doi.org/10.1016/j.neunet.2017.02.007