L2O-CFGD¶
Implements L2O-CFGD, a learned optimizer that drives Caputo fractional gradient descent by predicting its hyperparameters with a recurrent meta-optimizer.
Caputo fractional gradient descent steps along a fractional derivative anchored at a terminal point \(c\), which embeds a tunable memory of the loss landscape between \(c\) and the current iterate. Its behavior is sensitive to three quantities that are awkward to set by hand: the per-coordinate fractional order \(\alpha\), a smoothing term \(\beta\) that mixes in the next-order derivative, and the anchor \(c\). L2O-CFGD replaces this hand-tuning with a learned-to-optimize approach: a recurrent network \(M\) (parameters \(\varphi\)) reads the ordinary gradient and its own hidden state and emits \(\alpha\), \(\beta\), and \(c\) at every step, after which a standard descent step is taken along the resulting scaled Caputo fractional gradient (computed in practice by Gauss-Jacobi quadrature).
where the scaled Caputo fractional gradient is
where \(\theta\) are the parameters, \(\eta^{(t)}\) the learning rate, \(\nabla_\theta f\) the ordinary gradient, \({}_{c}^{C}\nabla^{\alpha}_{\theta} f\) the coordinatewise Caputo fractional gradient of order \(\alpha\) anchored at terminal point \(c\), \(\beta\) the per-coordinate smoothing weights on the next-order \((1+\alpha)\) fractional gradient, \(I\) the identity-function normalizer, \(M\) the recurrent meta-optimizer with weights \(\varphi\) and hidden state \(h\), and \(j\) the coordinate index.
Reference: Jan Sobotka, Petr Šimánek, Pavel Kordík, "Enhancing Fractional Gradient Descent with Learned Optimizers", arXiv 2025. https://arxiv.org/abs/2510.18783