ZetA¶
Implements ZetA, a hybrid optimizer that blends an Adam update with a Riemann-zeta-scaled step.
ZetA forms the usual Adam direction \(u_{\text{adam}}\) from bias-corrected moments, then adds a second direction \(u_\zeta\) whose magnitude is controlled by the Riemann zeta function \(\zeta(s_t)\) at a time-varying exponent \(s_t \in (1, 2]\). The zeta term divides the bias-corrected first moment by a power of the gradient norm and by \(\zeta(s_t)\), and is amplified by a boost factor \(b_t\) that grows when consecutive gradients are positively aligned. The two directions are mixed by \(\alpha\), the step is taken under a cosine learning-rate schedule with decoupled weight decay, and a SAM-style perturbation precedes the update.
where \(\theta\) are the parameters, \(\eta\) the base learning rate, \(g_t\) the gradient, \(m_t\)/\(v_t\) the first- and second-moment estimates, \(\hat m_t\)/\(\hat v_t\) their bias-corrected forms, \(\beta_1,\beta_2\) the decay rates, \(\lambda\) the weight decay, \(\epsilon\) a small stability constant, \(\zeta(\cdot)\) the Riemann zeta function, \(s_t \in (s_{\min}, s_{\max}] \subset (1,2]\) the dynamic zeta exponent, \(b_t\) the cosine-similarity boost (\(\delta_t\) a boost gate), \(\alpha\) the mix between the Adam and zeta directions, and \(T\) the total number of steps. A SAM-style perturbation \(\theta^{+} = \theta + \gamma\, u_t / (\lVert u_t\rVert + \epsilon)\) is applied before computing the final update.
Reference: Samiksha BC, "ZetA: A Hybrid Optimizer Combining Riemann Zeta Scaling with Adam for Robust Deep Learning", arXiv 2025. https://arxiv.org/abs/2508.02719