FracGrad¶
Implements FracGrad, a fractional-integral reweighting of accumulated microbatch gradients.
Gradient accumulation sums the gradients of \(N\) sequential microbatches before a single parameter update, normally weighting each microbatch equally. FracGrad instead derives the per-microbatch weights from a discretized Riemann–Liouville fractional integral, yielding a power-law schedule that biases the accumulated gradient toward the most recent microbatches while retaining the contribution of earlier ones.
Within an accumulation window let \(g_t^{(i)}\) be the gradient of the \(i\)-th microbatch (\(i = 1, \dots, N\), with \(i = N\) the most recent). The weighted accumulated gradient and parameter step are
where \(\theta\) are the parameters, \(\eta\) the learning rate, \(g_t^{(i)}\) the \(i\)-th microbatch gradient within an accumulation window of size \(N\), \(g_t\) the fractionally weighted accumulated gradient, and \(\alpha \in (0, 1]\) the fractional order controlling recency bias. The weights sum to one; \(\alpha = 1\) recovers uniform averaging, and smaller \(\alpha\) skews weight toward recent microbatches. The accumulated \(g_t\) may also be passed to any base optimizer in place of the plain mean.
Reference: Minhyeok Lee, "FracGrad: A Discretized Riemann–Liouville Fractional Integral Approach to Gradient Accumulation for Deep Learning", Fractal and Fractional 2025. https://doi.org/10.3390/fractalfract9110733