KO¶
Implements KO, a kinetics-inspired wrapper that perturbs gradients via simulated particle collisions before a base optimizer step.
KO (Kinetics-inspired Optimizer) views the rows of a weight matrix as colliding particles and reformulates training as the evolution of a particle system. To counteract parameter condensation and promote diversity, it injects a repulsion force into the gradient: neuron pairs that are similar in both weight and gradient direction repel one another. The modified gradient is then handed to a standard base optimizer (SGD or Adam), so KO adds no parameters of its own beyond a single collision coefficient.
The practical "soft collision" form perturbs the layer gradient, then applies the base update:
where \(w \in \mathbb{R}^{N\times D}\) and \(g_t \in \mathbb{R}^{N\times D}\) are the layer weight and gradient matrices, \(\cos(w,w)\) and \(\cos(g_t,g_t)\) are the \(N\times N\) pairwise cosine-similarity matrices of the weight rows and gradient rows, \(\gamma_c\) is the collision coefficient controlling the fraction of colliding particles, and \(\mathrm{BaseOpt}\) is the unmodified SGD or Adam update applied to the collision-adjusted gradient \(\tilde g_t\).
Reference: Mingquan Feng, Yixin Huang, Yifan Fu, Shaobo Wang, Junchi Yan, "KO: Kinetics-inspired Neural Optimizer with PDE Simulation Approaches", arXiv 2025. https://arxiv.org/abs/2505.14777