zij زِيج¶
A canon of deep learning optimization algorithms.
A zij (Arabic: زِيج, pronounced "zeej") is an astronomical handbook from the Islamic golden age: a set of tables and computational methods that astronomers consulted instead of re-deriving the field from scratch. The best known is the Zīj al-Sindhind of Muḥammad ibn Mūsā al-Khwārizmī, whose Latinized name became the word algorithm and whose book al-Jabr gave us algebra. This project takes the name in that spirit: one reference for the optimization algorithms of machine learning — the equation, the paper, and runnable code in one place.
The Canon spans 740 methods across 11 categories, with 100+ implemented as a PyTorch library. Every optimizer's name links to its update-rule page.
Installation¶
Quick start¶
import zij
# torch.optim, vendored at tag v2.12.0
opt = zij.AdamW(model.parameters(), lr=3e-4)
# research optimizers, same interface
opt = zij.Muon([p for p in model.parameters() if p.ndim == 2], lr=2e-2)
opt = zij.Prodigy(model.parameters()) # no learning rate to set
# look up by name
opt_cls = zij.load_optimizer("soap")
zij.optim mirrors torch.optim, so zij.optim.AdamW is the same class as
zij.AdamW.
Note
A few families use a documented non-standard call protocol. Schedule-Free
needs opt.train() and opt.eval(); the SAM family takes a closure or an
explicit first_step / second_step pair; Adam-mini and LOMO are built from
a model rather than a parameter list. Each optimizer's page notes which.
Browse the Canon¶
Each category lists the canonical name, venue, paper, the best available
implementation, and the zij class where one exists. Every name links to its
update-rule page.
| Category | |
|---|---|
| First-order | SGD, the Adam family, sign-based and variance-reduced methods |
| Memory-efficient | Adafactor, 8-bit and low-rank state methods |
| Fractional-order | optimizers built on fractional calculus |
| Distributed | communication-efficient and large-batch methods |
| Second-order | curvature, quasi-Newton, and preconditioned methods |
| Zeroth-order | gradient-free and finite-difference methods |
| Privacy-preserving | differentially private optimization |
| Sharpness-aware | SAM and flat-minima methods |
| Quantum | optimizers for variational quantum circuits |
| Learning-rate-free | parameter-free and adaptive step-size methods |
| Schedulers | learning-rate schedules |