Fractional Order Gradient Method¶
Implements Fractional Order Gradient Method, a Caputo fractional-derivative replacement for the gradient in CNN training.
The method replaces the integer-order gradient in gradient descent with a Caputo fractional derivative of order \(\alpha \in (0, 2)\). The fractional derivative evaluates the first-order gradient at the previous iterate \(\theta_{t-1}\) and scales it by a power of the step displacement \(|\theta_t - \theta_{t-1}|\), giving the update a memory of past iterates. A small constant \(\delta\) is added inside the displacement to avoid the singularity that arises when consecutive iterates coincide.
where \(\theta\) are the parameters, \(\eta\) the learning rate, \(\alpha \in (0,2)\) the fractional order, \(\Gamma\) the Gamma function, \(\nabla f(\theta_{t-1})\) the ordinary first-order gradient evaluated at the previous iterate, \(|\cdot|\) and the power applied elementwise, and \(\delta > 0\) a small constant preventing division by zero when \(\theta_t = \theta_{t-1}\). Setting \(\alpha = 1\) recovers standard gradient descent.
Reference: Dian Sheng, Yiheng Wei, Yuquan Chen, Yong Wang, "Convolutional neural networks with fractional order gradient method", Neurocomputing 2020. https://arxiv.org/abs/1905.05336