AdaFamily¶
Implements AdaFamily, a parameterized family of Adam-like adaptive methods that interpolates between Adam, AdaBelief, and AdaMomentum.
A single hyperparameter \(\mu \in [0, 1]\) selects how the second moment is formed. The squared quantity it tracks is a convex blend of the raw gradient \(g_t\) and the first moment \(m_t\), scaled by a normalization factor \(c\). At \(\mu = 0\) it recovers Adam (variance of \(g_t\)), at \(\mu = 0.5\) AdaBelief (variance of \(g_t - m_t\)), and at \(\mu = 1\) AdaMomentum (second moment of \(m_t\)); intermediate values yield a continuum of optimizers.
where \(\theta\) are the parameters, \(\eta\) the learning rate, \(g_t\) the gradient, \(m_t,v_t\) the first and second moments with decays \(\beta_1,\beta_2\), \(\hat{m}_t,\hat{v}_t\) their bias-corrected values, \(\epsilon\) a stability constant, and \(\mu \in [0,1]\) the family hyperparameter with \(c\) its normalization factor.
Reference: Hannes Fassold, "AdaFamily: A family of Adam-like adaptive gradient methods", arXiv 2022. https://arxiv.org/abs/2203.01603