Adaptive Terminal Caputo Fractional Gradient Descent (AT-CFGD)¶

Implements Adaptive Terminal Caputo Fractional Gradient Descent (AT-CFGD), gradient descent that replaces the integer-order derivative with a Caputo fractional derivative whose terminal point is reset each step.

Classical fractional gradient descent with a fixed terminal point converges to a biased point rather than the true minimizer, because the fractional derivative does not vanish there. AT-CFGD fixes this by tying the terminal \(c_t\) to the current iterate through the gradient, so the fractional operator is consistent with the local descent direction; a \(\beta\)-weighted order-\((1+\alpha)\) term adds a higher-order correction. The univariate operator is applied coordinatewise in the multidimensional case.

\[ \begin{aligned} D^{\alpha}_{c} f(x) &= \frac{1}{\Gamma(n-\alpha)} \int_{c}^{x} \frac{f^{(n)}(t)}{(x-t)^{\alpha-n+1}}\,dt, \qquad n=\lceil \alpha \rceil, \\ \delta^{\alpha,\beta}_{c} f(x) &= \frac{\Gamma(2-\alpha)\,|x-c|^{\alpha}}{x-c}\Big( D^{\alpha}_{c} f(x) + \beta\,|x-c|\,D^{1+\alpha}_{c} f(x) \Big), \\ x_t - c_t &= -\lambda_t\,\nabla f(x_t), \\ x_{t+1} &= x_t - \eta_t\,\delta^{\alpha,\beta}_{c_t} f(x_t). \end{aligned} \]

where \(\theta \equiv x\) are the parameters, \(\eta_t\) is the learning rate, \(D^{\alpha}_{c}\) is the Caputo fractional derivative of order \(\alpha \in (0,1]\) with terminal point \(c\), \(\Gamma\) is the gamma function, \(\beta\) weights the order-\((1+\alpha)\) correction, and \(\lambda_t > 0\) sets the adaptive terminal point relative to the gradient.

Reference: Ashwani Aggarwal, "Convergence Analysis of Fractional Gradient Descent", arXiv 2023. https://arxiv.org/abs/2311.18426

Back to the Canon