AdamG¶
Implements AdamG, a parameter-free Adam with the golden step size.
\[
\begin{aligned}
v_t &= \beta_2 v_{t-1} + (1 - \beta_2) g_t^2 \\
r_t &= \beta_3 r_{t-1} + (1 - \beta_3) s(v_t),
\quad s(x) = p \, x^q \\
m_t &= \beta_1 m_{t-1} + (1 - \beta_1) \, r_t \odot g_t \\
\hat{m}_t &= m_t / (1 - \beta_1^t), \quad
\hat{v}_t = v_t / (1 - \beta_2^t) \\
\theta_t &= \theta_{t-1} - \min(\eta, 1/\sqrt{t}) \,
\hat{m}_t / (\sqrt{\hat{v}_t} + \epsilon)
\end{aligned}
\]
Reference: Yijiang Pang, Shuyang Yu, Bao Hoang, Jiayu Zhou, "Towards Stability of Parameter-free Optimization", arXiv 2024. https://arxiv.org/abs/2405.04376