MLorc¶
Implements MLorc, memory-efficient adaptation that low-rank compresses and reconstructs the optimizer momentum rather than the gradient.
MLorc (Momentum Low-rank compression) keeps Adam's full-rank gradient signal but stores only a rank-\(r\) factorization of the first and second moments. At each step the previous moments are reconstructed from their stored factors, updated with the current full gradient \(g_t\), then recompressed by randomized SVD before the parameter update. Because the second moment must stay non-negative after a low-rank approximation, the reconstructed \(\tilde{v}_{t-1}\) is rectified by a ReLU and shifted by the average magnitude of the discarded negative entries.
The matrix-shaped weight \(W\) is updated by the standard bias-corrected AdamW rule using the reconstructed moments:
where \(W\) is a weight matrix, \(\alpha\) the learning rate, \(g_t\) the full-rank gradient, \(m_t\)/\(v_t\) the first and second moments, \(\beta_1,\beta_2\) their decay rates, \(\lambda\) the weight decay, \(\epsilon\) the stability constant, \(r\) the target rank and \(p\) the oversampling parameter of the randomized SVD \(\mathrm{RSVD}\), and \((\cdot_u, \cdot_s, \cdot_v)\) the stored left-factor, singular, and right-factor matrices whose product reconstructs a moment. The ReLU shift averages only over the negative (discarded) entries; non-negative entries are left unchanged.
Reference: Wei Shen, Yaxiang Zhang, Minhui Huang, Mengfan Xu, Jiawei Zhang, Cong Shen, "MLorc: Momentum Low-rank Compression for Large Language Model Adaptation", arXiv 2025. https://arxiv.org/abs/2506.01897