Fractional Order Gradient Method¶

Implements Fractional Order Gradient Method, a Caputo fractional-derivative replacement for the gradient in CNN training.

The method replaces the integer-order gradient in gradient descent with a Caputo fractional derivative of order \(\alpha \in (0, 2)\). The fractional derivative evaluates the first-order gradient at the previous iterate \(\theta_{t-1}\) and scales it by a power of the step displacement \(|\theta_t - \theta_{t-1}|\), giving the update a memory of past iterates. A small constant \(\delta\) is added inside the displacement to avoid the singularity that arises when consecutive iterates coincide.

\[ \begin{aligned} g_t &= \frac{\nabla f(\theta_{t-1})}{\Gamma(2-\alpha)} \, \big(|\theta_t - \theta_{t-1}| + \delta\big)^{1-\alpha} \\ \theta_{t+1} &= \theta_t - \eta \, g_t \end{aligned} \]

where \(\theta\) are the parameters, \(\eta\) the learning rate, \(\alpha \in (0,2)\) the fractional order, \(\Gamma\) the Gamma function, \(\nabla f(\theta_{t-1})\) the ordinary first-order gradient evaluated at the previous iterate, \(|\cdot|\) and the power applied elementwise, and \(\delta > 0\) a small constant preventing division by zero when \(\theta_t = \theta_{t-1}\). Setting \(\alpha = 1\) recovers standard gradient descent.

Reference: Dian Sheng, Yiheng Wei, Yuquan Chen, Yong Wang, "Convolutional neural networks with fractional order gradient method", Neurocomputing 2020. https://arxiv.org/abs/1905.05336

Back to the Canon