GADAM¶
Implements GADAM, a genetic-evolutionary Adam that evolves a population of models trained with Adam.
GADAM maintains a population of \(g\) unit models. Within each generation, every model is trained locally with the standard Adam update on a data batch. Models are then ranked by validation fitness, and new offspring are produced by a performance-weighted crossover of parent pairs followed by a fitness-correlated mutation; the best \(g\) models from the union of parents and offspring survive to the next generation. The local learning step is plain Adam:
The genetic layer combines two parents \(i,j\) (with validation losses \(\hat{\mathcal{L}}_i,\hat{\mathcal{L}}_j\)) into a child, then mutates it:
where \(\theta\) are parameters, \(\eta_t\) the learning rate, \(g_t\) the gradient, \(m_t,v_t\) the first and second moments with decays \(\beta_1,\beta_2\), \(\epsilon\) a stability constant, \(\mathbb{1}(\cdot)\) the indicator, \(r,r'\) uniform random draws, \(p_{i,j}\) the softmax inheritance probability favoring the lower-loss parent, and \(p_q\) a mutation rate that decreases with parent fitness.
Reference: Jiawei Zhang, Fisher B. Gouza, "GADAM: Genetic-Evolutionary ADAM for Deep Neural Network Optimization", arXiv 2018. https://arxiv.org/abs/1805.07500