Pion¶
Implements Pion, a spectrum-preserving optimizer that updates each weight matrix by orthogonal equivalence transformation.
Unlike additive optimizers such as Adam or Muon, which add a step to the weights, Pion multiplies each weight matrix on the left and right by orthogonal transformations. Because orthogonal factors leave singular values unchanged, the spectrum of every weight matrix is preserved exactly throughout training, removing the need for explicit normalization. The gradient is first projected onto the Lie algebra of skew-symmetric matrices on each side, accumulated with Adam-style first and second moments, and applied through a truncated matrix exponential.
where \(W_t\) is the weight matrix of shape \(d_{\mathrm{out}} \times d_{\mathrm{in}}\), \(g_t\) its gradient, \(G_t^{\mathrm{in}}, G_t^{\mathrm{out}}\) the skew-symmetric input- and output-side projections, \(m_t, v_t\) the first and second moments per side, \(A_t\) the Adam-style adaptive directions, \(\mathcal{E}_2\) the second-order truncation of the matrix exponential, \(\eta\) the learning rate, \(\alpha_t\) the RMS-based per-matrix scale, \(c\) a target-RMS constant, \(\beta_1, \beta_2\) the moment decay rates, and \(\epsilon\) a stability constant.
Reference: Kexuan Shi, Hanxuan Li, Zeju Qiu, Yandong Wen, Simon Buchholz, Weiyang Liu, "Pion: A Spectrum-Preserving Optimizer via Orthogonal Equivalence Transformation", arXiv 2026. https://arxiv.org/abs/2605.12492