Batched / Transported Scion¶
Implements Batched / Transported Scion, scale-invariant stochastic conditional-gradient optimizers that take steps inside a norm ball via a linear minimization oracle.
Scion replaces the usual descent direction with the linear minimization oracle (LMO) of a momentum-averaged gradient over the unit ball of a chosen norm (typically the spectral norm on weight matrices). Because the LMO depends only on the direction of its argument, the resulting step is scale-invariant. The batched variant averages \(B\) stochastic gradients per step before feeding them into the momentum buffer, which controls the heavy-tailed noise that makes plain stochastic LMO steps unstable.
The transported variant evaluates the gradient not at the current iterate but at an extrapolated point \(Y_t\) obtained by carrying forward the previous step, in the spirit of a momentum/extragradient correction. This exploits Hessian smoothness for a better complexity rate while keeping the same LMO-based update on the primary sequence \(\theta_t\).
where \(\theta\) are the parameters, \(\eta_t\) the stepsize, \(\beta_t\) the momentum coefficient, \(g(\cdot,\xi)\) a stochastic gradient, \(\bar{g}_t\) its average over a batch of \(B\) samples, \(m_t\) the momentum buffer, \(\|\cdot\|\) the chosen norm (spectral norm for matrices) with \(\mathrm{lmo}\) its linear minimization oracle over the unit ball, and \(Y_t\) the extrapolated (transported) evaluation point. Setting \(Y_t=\theta_t\) recovers the batched Scion method.
Reference: Jiayu Zhang, Tianyi Lin, "Scale-Invariant Neural Network Optimization: Norm Geometry and Heavy-Tailed Noise", arXiv 2026. https://arxiv.org/abs/2605.18528