Skip to content

zij  زِيج

A canon of deep learning optimization algorithms.

A zij (Arabic: زِيج, pronounced "zeej") is an astronomical handbook from the Islamic golden age: a set of tables and computational methods that astronomers consulted instead of re-deriving the field from scratch. The best known is the Zīj al-Sindhind of Muḥammad ibn Mūsā al-Khwārizmī, whose Latinized name became the word algorithm and whose book al-Jabr gave us algebra. This project takes the name in that spirit: one reference for the optimization algorithms of machine learning — the equation, the paper, and runnable code in one place.

The Canon spans 740 methods across 11 categories, with 100+ implemented as a PyTorch library. Every optimizer's name links to its update-rule page.

Installation

pip install zij

Quick start

import zij

# torch.optim, vendored at tag v2.12.0
opt = zij.AdamW(model.parameters(), lr=3e-4)

# research optimizers, same interface
opt = zij.Muon([p for p in model.parameters() if p.ndim == 2], lr=2e-2)
opt = zij.Prodigy(model.parameters())          # no learning rate to set

# look up by name
opt_cls = zij.load_optimizer("soap")

zij.optim mirrors torch.optim, so zij.optim.AdamW is the same class as zij.AdamW.

Note

A few families use a documented non-standard call protocol. Schedule-Free needs opt.train() and opt.eval(); the SAM family takes a closure or an explicit first_step / second_step pair; Adam-mini and LOMO are built from a model rather than a parameter list. Each optimizer's page notes which.

Browse the Canon

Each category lists the canonical name, venue, paper, the best available implementation, and the zij class where one exists. Every name links to its update-rule page.

Category
First-order SGD, the Adam family, sign-based and variance-reduced methods
Memory-efficient Adafactor, 8-bit and low-rank state methods
Fractional-order optimizers built on fractional calculus
Distributed communication-efficient and large-batch methods
Second-order curvature, quasi-Newton, and preconditioned methods
Zeroth-order gradient-free and finite-difference methods
Privacy-preserving differentially private optimization
Sharpness-aware SAM and flat-minima methods
Quantum optimizers for variational quantum circuits
Learning-rate-free parameter-free and adaptive step-size methods
Schedulers learning-rate schedules