Rover Descent: Learning to Optimize by Learning to Navigate on Prototypical Loss Surfaces

  • Louis FauryEmail author
  • Flavian Vasile
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11353)


Learning to optimize - the idea that we can learn from data algorithms that optimize a numerical criterion - has recently been at the heart of a growing number of research efforts. One of the most challenging issues within this approach is to learn a policy that is able to optimize over classes of functions that are different from the classes that the policy was trained on. We propose a novel way of framing learning to optimize as a problem of learning a good navigation policy on a partially observable loss surface. To this end, we develop Rover Descent, a solution that allows us to learn a broad optimization policy from training only on a small set of prototypical two-dimensional surfaces that encompasses classically hard cases such as valleys, plateaus, cliffs and saddles and by using strictly zeroth-order information. We show that, without having access to gradient or curvature information, we achieve fast convergence on optimization problems not presented at training time, such as the Rosenbrock function and other two-dimensional hard functions. We extend our framework to optimize over high dimensional functions and show good preliminary results.


  1. 1.
    Agarwal, A., Wainwright, M.J., Bartlett, P.L., Ravikumar, P.K.: Information-theoretic lower bounds on the oracle complexity of convex optimization. In: Advances in Neural Information Processing Systems, pp. 1–9 (2009)Google Scholar
  2. 2.
    Andrychowicz, M., et al.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems, pp. 3981–3989 (2016)Google Scholar
  3. 3.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. In: IEEE Transactions on Neural Networks (1994)Google Scholar
  4. 4.
    Billard, A.G., Calinon, S., Dillmann, R.: Learning from Humans, pp. 1995–2014. Springer International Publishing, Cham (2016). Scholar
  5. 5.
    Bottou, L., Curtis, F.E., Nocedal, J.: Optimization methods for large-scale machine learning (2016). arXiv:1606.04838
  6. 6.
    Chen, Y., et al.: Learning to learn without gradient descent by gradient descent. Learning (2017)Google Scholar
  7. 7.
    Daniel, C., Taylor, J., Nowozin, S.: Learning step size controllers for robust neural network training. In: AAAI, pp. 1519–1525 (2016)Google Scholar
  8. 8.
    Dauphin, Y.N., Pascanu, R., Gulcehre, C., Cho, K., Ganguli, S., Bengio, Y.: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In: Advances in Neural Information Processing Systems (2014)Google Scholar
  9. 9.
    Duchi, J.C., Jordan, M.I., Wainwright, M.J., Wibisono, A.: Optimal rates for zero-order convex optimization: The power of two function evaluations. In: IEEE Transactions on Information Theory (2015)Google Scholar
  10. 10.
    Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Hum. Genet. 7(2), 179–188 (1936)Google Scholar
  11. 11.
    Gutmann, M., Hyvärinen, A.: Noise-contrastive estimation: a new estimation principle for unnormalized statistical models. In: Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics, pp. 297–304 (2010)Google Scholar
  12. 12.
    Guyon, I.: Design of experiments for the NIPS 2003 variable selection benchmark (2003)Google Scholar
  13. 13.
    Hansen, N.: The CMA evolution strategy: a tutorial (2016). arXiv:1604.00772
  14. 14.
    Hansen, S.: Using deep Q-learning to control optimization hyperparameters (2016). arXiv:1602.04062
  15. 15.
    Heess, N., Hunt, J.J., Lillicrap, T.P., Silver, D.: Memory-based control with recurrent neural networks (2015). arXiv:1512.04455
  16. 16.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. In: Neural Computation (1997)Google Scholar
  17. 17.
    Hochreiter, S., Younger, A.S., Conwell, P.R.: Learning to learn using gradient descent. In: International Conference on Artificial Neural Networks (2001)Google Scholar
  18. 18.
    Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv:1412.6980
  19. 19.
    Li, K., Malik, J.: Learning to optimize (2016). arXiv:1606.01885
  20. 20.
    Li, K., Malik, J.: Learning to optimize neural nets (2017). arXiv:1703.00441
  21. 21.
    Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning (2015). arXiv:1509.02971
  22. 22.
    Martens, J.: Deep learning via Hessian-free optimization. In: ICML (2010)Google Scholar
  23. 23.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  24. 24.
    Nelder, J.A., Mead, R.: A simplex method for function minimization. Comput. J. 7(4), 308–313 (1965)MathSciNetCrossRefGoogle Scholar
  25. 25.
    Nemirovskii, A., Yudin, D.B., Dawson, E.R.: Problem complexity and method efficiency in optimization (1983)Google Scholar
  26. 26.
    Nesterov, Y.: A method of solving a convex programming problem with convergence rate O(1/k2). Sov. Math. Dokl. 27, 372–376 (1983)zbMATHGoogle Scholar
  27. 27.
    Nocedal, J., Wright, S.: Numerical Optimization. Springer Science & Business Media, Berlin (2006)Google Scholar
  28. 28.
    Schaul, T., Antonoglou, I., Silver, D.: Unit tests for stochastic optimization (2013). arXiv:1312.6055
  29. 29.
    Schmidhuber, J.: Evolutionary Principles in Self-Referential Learning. On Learning now to Learn: The Meta-Meta-Meta...-Hook. Master’s thesis (1987)Google Scholar
  30. 30.
    Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14) (2014)Google Scholar
  31. 31.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  32. 32.
    Wichrowska, O., et al.: Learned optimizers that scale and generalize. In: CoRR (2017)Google Scholar
  33. 33.
    Wolpert, D.H., Macready, W.G.: No free lunch theorems for optimization. In: IEEE Transactions on Evolutionary Computation (1997)Google Scholar
  34. 34.
    Zeiler, M.D.: Adadelta: an adaptive learning rate method (2012). arXiv:1212.5701

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Criteo ResearchParisFrance
  2. 2.Ecole Polytechnique Federale de LausanneLausanneSwitzerland

Personalised recommendations