FedIvon¶
Implements FedIvon, federated learning where each client runs the Improved Variational Online Newton (IVON) optimizer and the server merges clients by Fisher-weighted averaging.
Each client maintains a diagonal Gaussian posterior over the weights with mean \(\mu\) and Hessian (precision) estimate \(h\). Locally it runs IVON: parameters are sampled \(\theta \sim \mathcal{N}(\mu, \sigma^2)\) with \(\sigma^2 = 1/(\lambda(h+\delta))\), the gradient at the sample yields a reparameterization estimate of the per-coordinate Hessian, and momentum-smoothed gradient and curvature drive a Newton-style mean update. After local training the server treats each client's \(h\) as a Fisher precision and performs a precision-weighted merge of the means, which doubles as a natural model-averaging rule and provides posterior uncertainty.
where \(\mu\) is the variational mean (the deployed weights \(\theta\)), \(\sigma^2\) the posterior variance, \(h_t\) the diagonal Hessian/precision estimate, \(g_t\) the momentum, \(\hat g\) and \(\hat h\) the sampled gradient and reparameterization Hessian, \(\eta\) the learning rate, \(\beta_1,\beta_2\) the momentum decays, \(\delta\) the weight decay (prior precision), \(\lambda\) the effective sample size, \(\odot\) element-wise product, and \(w_k = N_k / \sum_j N_j\) the data-proportion weight of client \(k\) over communication round \(r\).
Reference: Shivam Pal, Aishwarya Gupta, Saqib Sarwar, Piyush Rai, "Federated Learning with Uncertainty and Personalization via Efficient Second-order Optimization", TMLR 2025. https://arxiv.org/abs/2411.18385