Motor Learning at Intermediate Reynolds Number: Experiments with Policy Gradient on the Flapping Flight of a Rigid Wing

Roberts, John W.; Moret, Lionel; Zhang, Jun; Tedrake, Russ

doi:10.1007/978-3-642-05181-4_13

Motor Learning at Intermediate Reynolds Number: Experiments with Policy Gradient on the Flapping Flight of a Rigid Wing

John W. Roberts⁴,
Lionel Moret⁵,
Jun Zhang⁵ &
…
Russ Tedrake⁴

Chapter

1539 Accesses
11 Citations

Part of the book series: Studies in Computational Intelligence ((SCI,volume 264))

Abstract

This work describes the development of a model-free reinforcement learning-based control methodology for the heaving plate, a laboratory experimental fluid system that serves as a model of flapping flight. Through an optimized policy gradient algorithm, we were able to demonstrate rapid convergence (requiring less than 10 minutes of experiments) to a stroke form which maximized the propulsive efficiency of this very complicated fluid-dynamical system. This success was due in part to an improved sampling distribution and carefully selected policy parameterization, both motivated by a formal analysis of the signal-to-noise ratio of policy gradient algorithms. The resulting optimal policy provides insight into the behavior of the fluid system, and the effectiveness of the learning strategy suggests a number of exciting opportunities for machine learning control of fluid dynamics.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alben, S., Shelley, M.: Coherent locomotion as an attracting state for a free flapping body. Proceedings of the National Academy of Science 102, 11163–11166 (2005)
Article Google Scholar
Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10, 251–276 (1998)
Article Google Scholar
Baxter, J., Bartlett, P.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)
Article MATH MathSciNet Google Scholar
Bennis, A., Leeser, M., Tadmor, G., Tedrake, R.: Implementation of a highly parameterized digital PIV system on reconfigurable hardware. In: Proceedings of the Twelfth Annual Workshop on High Performance Embedded Computing (HPEC), Lexington, MA (2008)
Google Scholar
Collins, S.H., Ruina, A., Tedrake, R., Wisse, M.: Efficient bipedal robots based on passive-dynamic walkers. Science 307, 1082–1085 (2005)
Article Google Scholar
Greensmith, E., Bartlett, P.L., Baxter, J.: Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research 5, 1471–1530 (2004)
MathSciNet Google Scholar
Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: Methods for learning control policies from variable-constraint demonstrations. In: Sigaud, O., Peters, J. (eds.) From Motor Learning to Interaction Learning in Robots. SCI, vol. 264, pp. 253–291. Springer, Heidelberg (2010)
Google Scholar
Jabri, M., Flower, B.: Weight perturbation: An optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks. IEEE Trans. Neural Netw. 3, 154–157 (1992)
Article Google Scholar
Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101 (1998)
Google Scholar
Kober, J., Mohler, B., Peters, J.: Imitation and reinforcement learning for motor primitives with perceptual coupling. In: Sigaud, O., Peters, J. (eds.) From Motor Learning to Interaction Learning in Robots. SCI, vol. 264, pp. 209–225. Springer, Heidelberg (2010)
Google Scholar
Meuleau, N., Peshkin, L., Kaelbling, L.P., Kim, K.-E.: Off-policy policy search. In: NIPS (2000)
Google Scholar
Peters, J., Vijayakumar, S., Schaal, S.: Policy gradient methods for robot control (Technical Report CS-03-787). University of Southern California (2003)
Google Scholar
Roberts, J.W., Tedrake, R.: Signal-to-noise ratio analysis of policy gradient algorithms. In: Advances of Neural Information Processing Systems (NIPS), vol. 21, p. 8 (2009)
Google Scholar
Shelley, M.: Personal Communication (2007)
Google Scholar
Tedrake, R., Zhang, T.W., Seung, H.S.: Stochastic policy gradient reinforcement learning on a simple 3D biped. In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), Sendai, Japan, pp. 2849–2854 (2004)
Google Scholar
Vandenberghe, N., Childress, S., Zhang, J.: On unidirectional flight of a free flapping wing. Physics of Fluids, 18 (2006)
Google Scholar
Vandenberghe, N., Zhang, J., Childress, S.: Symmetry breaking leads to forward flapping flight. Journal of Fluid Mechanics 506, 147–155 (2004)
Article MATH Google Scholar
Williams, J.L., Fisher III, J.W., Willsky, A.S.: Importance sampling actor-critic algorithms. In: Proceedings of the 2006 American Control Conference (2006)
Google Scholar
Williams, R.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Massachusetts Institute of Technology, 32 Vassar St. Room 32-380, Cambridge, MA, 02139
John W. Roberts & Russ Tedrake
New York University, 4 Washington Place, New York, NY 10003
Lionel Moret & Jun Zhang

Authors

John W. Roberts
View author publications
You can also search for this author in PubMed Google Scholar
Lionel Moret
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Russ Tedrake
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institut des Systèmes Intelligents et de Robotique (CNRS UMR 7222), Université Pierre et Marie Curie Pyramide, Tour 55 Boîte courrier 173, 4 Place Jussieu, 75252, PARIS cedex 05, France
Olivier Sigaud
Dept. Schölkopf, Max-Planck Institute for Biological Cybernetics, Spemannstraße 38,Rm 223, 72076, Tübingen, Germany
Jan Peters

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Roberts, J.W., Moret, L., Zhang, J., Tedrake, R. (2010). Motor Learning at Intermediate Reynolds Number: Experiments with Policy Gradient on the Flapping Flight of a Rigid Wing. In: Sigaud, O., Peters, J. (eds) From Motor Learning to Interaction Learning in Robots. Studies in Computational Intelligence, vol 264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05181-4_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-05181-4_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-05180-7
Online ISBN: 978-3-642-05181-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics