2SEDFOSGD¶
Implements 2SEDFOSGD, fractional-order SGD whose fractional exponent is adapted per layer from the Two-Scale Effective Dimension.
Fractional-order SGD replaces the integer-order gradient step with a Caputo-style fractional difference, weighting the gradient by a power of the recent parameter change and a Gamma-function factor. 2SEDFOSGD makes the fractional order \(\alpha\) dynamic: it estimates a Two-Scale Effective Dimension \(d_\zeta\) for each layer from the Fisher information curvature and lowers \(\alpha\) where the effective dimension is large, so flatter, higher-dimensional layers take more memory-weighted steps while sharp directions stay closer to plain SGD.
where \(\theta^{(\ell)}\) are the parameters of layer \(\ell\), \(g_t^{(\ell)}\) its stochastic gradient, \(\mu_t\) the decaying step size, \(\Gamma\) the Gamma function, \(\alpha_t^{(\ell)} \in (0,1]\) the adaptive fractional order with base \(\alpha_0\) and tuning gain \(\beta\), \(\delta > 0\) a small offset preventing stalls, \(\rho\) the step-size decay exponent, and \(d_\zeta^{(\ell)}\) the Two-Scale Effective Dimension of layer \(\ell\) normalized by its maximum \(d_{\max} = \max_{\ell,k} d_\zeta^{(\ell)}(\varepsilon)\big|_k\).
Reference: Mohammad Partohaghighi, Roummel Marcia, YangQuan Chen, "More Optimal Fractional-Order Stochastic Gradient Descent for Non-Convex Optimization Problems", arXiv 2025. https://arxiv.org/abs/2505.02985