HGM¶
Implements HGM (Hindsight-Guided Momentum), an Adam variant that modulates the learning rate by the agreement between the current gradient and accumulated momentum.
HGM keeps Adam's first and second moments but adds a "hindsight" signal: the cosine similarity between the current gradient \(g_t\) and the previous momentum \(m_{t-1}\). When the gradient aligns with momentum the optimizer is on a consistent descent direction and the step is amplified; when they disagree the step is dampened. The similarity is smoothed over time and mapped to a multiplicative scale on the base learning rate through an exponential.
where \(\theta\) are the parameters, \(\alpha\) the base learning rate, \(g_t\) the gradient, \(m_t, v_t\) the first and second moment estimates with decays \(\beta_1, \beta_2\), \(c_t\) the cosine similarity between the gradient and the previous momentum, \(s_t\) its exponential moving average with smoothing coefficient \(\beta_s\), \(\gamma\) the modulation strength scaling the effective learning rate \(\eta_t\), and \(\epsilon\) a stability constant.
Reference: Krisanu Sarkar, "Hindsight-Guided Momentum (HGM) Optimizer: An Approach to Adaptive Learning Rates", arXiv preprint 2025. https://arxiv.org/abs/2506.22479