Frac-Adam¶
Implements Frac-Adam, a Caputo-fractional variant of Adam that injects long-term memory into the gradient signal.
The method replaces the integer-order gradient in Adam with a fractional-order derivative \(D^\alpha\) of the loss, so each moment estimate aggregates history through a power-law memory kernel rather than a single instantaneous gradient. The fractional order \(\alpha \in (0,1]\) tunes how strongly past gradients persist, matching memory effects such as volatility clustering in financial series. In practice the continuous Caputo derivative is approximated by a truncated Grünwald–Letnikov sum over a short memory window. The same fractional substitution defines a family (Frac-RMSprop, Frac-SGD, Frac-Adagrad, and others); the Adam form is given below.
where \(g_t^{(\alpha)}\) is the Grünwald–Letnikov approximation of the Caputo fractional derivative \(D^\alpha\) of the gradient \(g_t = \nabla_\theta J(\theta)\), \(\alpha \in (0,1]\) is the fractional order, \(\omega_k(\alpha)\) are the fractional binomial weights, \(h\) is the step size, \(M\) is the memory-window length, \(\eta\) is the learning rate, \(\beta_1,\beta_2\) are the moment decay rates, and \(\epsilon\) is a stability constant. The Caputo derivative is \(D_C^\alpha f(t) = \frac{1}{\Gamma(n-\alpha)}\int_0^t \frac{f^{(n)}(\tau)}{(t-\tau)^{\alpha-n+1}}\,d\tau\) for \(n-1 < \alpha < n\).
Reference: Mustapha Ez-zaiym, Yassine Senhaji, Meriem Rachid, Karim El Moutaouakil, Vasile Palade, "Fractional Optimizers for LSTM Networks in Financial Time Series Forecasting", Mathematics 2025. https://doi.org/10.3390/math13132068