Fitness Expectation Maximization

  • Daan Wierstra
  • Tom Schaul
  • Jan Peters
  • Jürgen Schmidhuber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5199)


We present Fitness Expectation Maximization (FEM), a novel method for performing ‘black box’ function optimization. FEM searches the fitness landscape of an objective function using an instantiation of the well-known Expectation Maximization algorithm, producing search points to match the sample distribution weighted according to higher expected fitness. FEM updates both candidate solution parameters and the search policy, which is represented as a multinormal distribution. Inheriting EM’s stability and strong guarantees, the method is both elegant and competitive with some of the best heuristic search methods in the field, and performs well on a number of unimodal and multimodal benchmark tasks. To illustrate the potential practical applications of the approach, we also show experiments on finding the parameters for a controller of the challenging non-Markovian double pole balancing task.


Reinforcement Learning Batch Size Benchmark Function Search Point Multimodal Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Spall, J., Hill, S., Stark, D.: Theoretical framework for comparing several stochastic optimization approaches. Probabilistic and Randomized Methods for Design under Uncertainty, 99–117 (2006)Google Scholar
  2. 2.
    Kirkpatrick, S., Gelatt, C.D., Vecchi, M.P.: Optimization by simulated annealing. Science 220(4598), 671–680 (1983)CrossRefzbMATHMathSciNetGoogle Scholar
  3. 3.
    Spall, J.C.: Stochastic optimization and the simultaneous perturbation method. In: WSC 1999: Proceedings of the 31st conference on Winter simulation, pp. 101–109. ACM, New York (1999)Google Scholar
  4. 4.
    Rubinstein, R.Y., Kroese, D.P.: The Cross-Entropy Method: A Unified Approach to Monte Carlo Simulation, Randomized Optimization and Machine Learning. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    De Boer, P., Kroese, D., Mannor, S., Rubinstein, R.: A tutorial on the cross-entropy method. Annals of Operations Research 134, 19–67 (2004)CrossRefGoogle Scholar
  6. 6.
    Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation 9(2), 159–195 (2001)CrossRefGoogle Scholar
  7. 7.
    Larraanaga, P., Lozano, J.A.: Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Kluwer Academic Publishers, Norwell (2001)Google Scholar
  8. 8.
    Peters, J., Schaal, S.: Reinforcement learning by reward-weighted regression for operational space control. In: Proceedings of the International Conference on Machine Learning (ICML) (2007)Google Scholar
  9. 9.
    Dayan, P., Hinton, G.E.: Using expectation-maximization for reinforcement learning. Neural Computation 9(2), 271–278 (1997)CrossRefzbMATHGoogle Scholar
  10. 10.
    Wolpert, D.H., Rajnarayan, D.G.: Parametric Learning and Monte Carlo Optimization. ArXiv e-prints 704 (April 2007)Google Scholar
  11. 11.
    Gallagher, M., Frean, M., Downs, T.: Real-valued evolutionary optimization using a flexible probability density estimator. In: Banzhaf, W., Daida, J., Eiben, A.E., Garzon, M.H., Honavar, V., Jakiela, M., Smith, R.E. (eds.) Proceedings of the Genetic and Evolutionary Computation Conference, Orlando, Florida, USA, vol. 1, pp. 840–846. Morgan Kaufmann, San Francisco (1999)Google Scholar
  12. 12.
    Chernoff, H., Moses, L.E.: Elementary Decision Theory. Dover Publications (1987)Google Scholar
  13. 13.
    Suganthan, P.N., Hansen, N., Liang, J.J., Deb, K., Chen, Y.P., Auger, A., Tiwari, S.: Problem definitions and evaluation criteria for the cec 2005 special session on real-parameter optimization. Technical report, Nanyang Technological University, Singapore (2005)Google Scholar
  14. 14.
    Gonzalez, C., Lozano, J.A., Larraanaga, P.: Mathematical modelling of umdac algorithm with tournament selection. Behaviour on linear and quadratic functions. International Journal of Approximate Reasoning 31(3), 313–340 (2002)CrossRefzbMATHMathSciNetGoogle Scholar
  15. 15.
    Hansen, N.: An analysis of mutative σ-self-adaptation on linear fitness functions. Evolutionary Computation 14(3), 255–275 (2006)CrossRefGoogle Scholar
  16. 16.
    Wieland, A.: Evolving neural network controllers for unstable systems. In: Proceedings of the International Joint Conference on Neural Networks, Seattle, WA, pp. 667–673. IEEE, Piscataway (1991)Google Scholar
  17. 17.
    Gomez, F.J., Miikkulainen, R.: Incremental evolution of complex general behavior. Adaptive Behavior 5, 317–342 (1997)CrossRefGoogle Scholar
  18. 18.
    Moriarty, D.E., Miikkulainen, R.: Efficient reinforcement learning through symbiotic evolution. Machine Learning 22, 11–32 (1996)Google Scholar
  19. 19.
    Stanley, K.O., Miikkulainen, R.: Evolving neural networks through augmenting topologies. Evolutionary Computation 10, 99–127 (2002)CrossRefGoogle Scholar
  20. 20.
    Igel, C.: Neuroevolution for reinforcement learning using evolution strategies. In: Congress on Evolutionary Computation (CEC 2003), vol. 4, pp. 2588–2595. IEEE Press, Los Alamitos (2003)Google Scholar
  21. 21.
    Faustino Gomez, J.S., Miikkulainen, R.: Efficient non-linear control through neuroevolution. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212. Springer, Heidelberg (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Daan Wierstra
    • 1
  • Tom Schaul
    • 1
  • Jan Peters
    • 3
  • Jürgen Schmidhuber
    • 1
    • 2
  1. 1.IDSIAManno-LuganoSwitzerland
  2. 2.TU MunichGarching, MünchenGermany
  3. 3.Max Planck Institute for Biological CyberneticsTübingenGermany

Personalised recommendations