Evolution Strategies¶
Implements Evolution Strategies, a black-box gradient estimator that updates parameters from the returns of randomly perturbed copies.
ES treats the objective \(F\) as a black box and optimizes the Gaussian-smoothed objective \(\mathbb{E}_{\epsilon\sim N(0,I)}\,F(\theta+\sigma\epsilon)\). Its gradient with respect to \(\theta\) is given by the score-function (REINFORCE-style) estimator \(\frac{1}{\sigma}\,\mathbb{E}_{\epsilon\sim N(0,I)}\{F(\theta+\sigma\epsilon)\,\epsilon\}\). At each step the algorithm samples a population of \(n\) perturbations, evaluates the return of each perturbed parameter vector, and takes a stochastic gradient ascent step that weights each perturbation by its return. Because only the scalar returns must be shared, the method parallelizes across many workers with minimal communication.
where \(\theta\) are the parameters, \(\alpha\) is the learning rate, \(\sigma\) is the noise standard deviation, \(n\) is the population size, \(\epsilon_i\) are i.i.d. standard-normal perturbations, and \(F_i\) is the return (fitness) of the \(i\)-th perturbed parameter vector.
Reference: Tim Salimans, Jonathan Ho, Xi Chen, Szymon Sidor, Ilya Sutskever, "Evolution Strategies as a Scalable Alternative to Reinforcement Learning", arXiv 2017. https://arxiv.org/abs/1703.03864