Tiger¶
Implements Tiger, a tight-fisted sign-momentum optimizer.
Tiger keeps a single momentum buffer and forms the update from its sign, which makes it a SignSGD variant with momentum and weight decay, and a special case of Lion with \(\beta_1 = \beta_2 = \beta\):
\[
\begin{aligned}
m_t &= \beta m_{t-1} + (1 - \beta)\, g_t \\
\theta_t &= (1 - \gamma \lambda)\, \theta_{t-1}
- \gamma \mathrm{sign}(m_t)
\end{aligned}
\]
where \(m_t\) is the momentum buffer, \(\gamma\) the learning rate,
\(\beta\) the momentum decay, and \(\lambda\) the decoupled weight
decay. With weight_decouple=False the decay is instead added to the
gradient as an L2 penalty, and with fixed_decay=True the decoupled decay
factor is \((1 - \lambda)\) rather than \((1 - \gamma \lambda)\).
Reference: Jianlin Su, "Tiger: A Tight-fisted Optimizer", GitHub 2023. https://github.com/bojone/tiger