CfGD / CfAdam¶
Implements CfGD / CfAdam, gradient descent and Adam driven by a Caputo fractional-based gradient.
The ordinary gradient is replaced by a Caputo fractional gradient that, coordinate-wise, mixes the order-\(\alpha\) and order-\((1+\alpha)\) Caputo derivatives taken from a lower/upper integral terminal \(c\). By Theorem 2.3 this direction is the steepest-descent direction of a smoothing \(c F_{\alpha,\beta}\) of the objective, so the parameters \(\alpha,\beta\) act as an implicit regularizer that can mitigate the dependence on the condition number; \(\alpha=1,\beta=0\) recovers the ordinary gradient. CfGD plugs this direction into gradient descent; CfAdam is standard Adam with its gradient replaced by the same Caputo fractional-based gradient.
where \({}^{C}_{c}\nabla_x^{\,\alpha} f\) is the Caputo fractional gradient of order \(\alpha\in(0,1)\) with per-coordinate integral terminal \(c=(c_j)\), \({}^{C}_{c}\nabla_x^{\,1+\alpha} f\) its order-\((1+\alpha)\) counterpart, \(d_t\) the resulting Caputo fractional-based gradient (replacing the ordinary \(g_t\)), \(\eta\) the learning rate, \(\beta\in\mathbb{R}\) the smoothing weight, \(m_t,v_t\) the first/second moments with decays \(\beta_1,\beta_2\), and \(\epsilon\) a stability constant.
Reference: Yeonjong Shin, Jérôme Darbon, George Em Karniadakis, "Accelerating gradient descent and Adam via fractional gradients", Neural Networks 2023. https://doi.org/10.1016/j.neunet.2023.01.002