Skip to main content

Motor Learning at Intermediate Reynolds Number: Experiments with Policy Gradient on the Flapping Flight of a Rigid Wing

  • Chapter

Part of the book series: Studies in Computational Intelligence ((SCI,volume 264))

Abstract

This work describes the development of a model-free reinforcement learning-based control methodology for the heaving plate, a laboratory experimental fluid system that serves as a model of flapping flight. Through an optimized policy gradient algorithm, we were able to demonstrate rapid convergence (requiring less than 10 minutes of experiments) to a stroke form which maximized the propulsive efficiency of this very complicated fluid-dynamical system. This success was due in part to an improved sampling distribution and carefully selected policy parameterization, both motivated by a formal analysis of the signal-to-noise ratio of policy gradient algorithms. The resulting optimal policy provides insight into the behavior of the fluid system, and the effectiveness of the learning strategy suggests a number of exciting opportunities for machine learning control of fluid dynamics.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Alben, S., Shelley, M.: Coherent locomotion as an attracting state for a free flapping body. Proceedings of the National Academy of Science 102, 11163–11166 (2005)

    Article  Google Scholar 

  • Amari, S.: Natural gradient works efficiently in learning. Neural Computation 10, 251–276 (1998)

    Article  Google Scholar 

  • Baxter, J., Bartlett, P.: Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15, 319–350 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Bennis, A., Leeser, M., Tadmor, G., Tedrake, R.: Implementation of a highly parameterized digital PIV system on reconfigurable hardware. In: Proceedings of the Twelfth Annual Workshop on High Performance Embedded Computing (HPEC), Lexington, MA (2008)

    Google Scholar 

  • Collins, S.H., Ruina, A., Tedrake, R., Wisse, M.: Efficient bipedal robots based on passive-dynamic walkers. Science 307, 1082–1085 (2005)

    Article  Google Scholar 

  • Greensmith, E., Bartlett, P.L., Baxter, J.: Variance reduction techniques for gradient estimates in reinforcement learning. Journal of Machine Learning Research 5, 1471–1530 (2004)

    MathSciNet  Google Scholar 

  • Howard, M., Klanke, S., Gienger, M., Goerick, C., Vijayakumar, S.: Methods for learning control policies from variable-constraint demonstrations. In: Sigaud, O., Peters, J. (eds.) From Motor Learning to Interaction Learning in Robots. SCI, vol. 264, pp. 253–291. Springer, Heidelberg (2010)

    Google Scholar 

  • Jabri, M., Flower, B.: Weight perturbation: An optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer networks. IEEE Trans. Neural Netw. 3, 154–157 (1992)

    Article  Google Scholar 

  • Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. Artificial Intelligence, 101 (1998)

    Google Scholar 

  • Kober, J., Mohler, B., Peters, J.: Imitation and reinforcement learning for motor primitives with perceptual coupling. In: Sigaud, O., Peters, J. (eds.) From Motor Learning to Interaction Learning in Robots. SCI, vol. 264, pp. 209–225. Springer, Heidelberg (2010)

    Google Scholar 

  • Meuleau, N., Peshkin, L., Kaelbling, L.P., Kim, K.-E.: Off-policy policy search. In: NIPS (2000)

    Google Scholar 

  • Peters, J., Vijayakumar, S., Schaal, S.: Policy gradient methods for robot control (Technical Report CS-03-787). University of Southern California (2003)

    Google Scholar 

  • Roberts, J.W., Tedrake, R.: Signal-to-noise ratio analysis of policy gradient algorithms. In: Advances of Neural Information Processing Systems (NIPS), vol. 21, p. 8 (2009)

    Google Scholar 

  • Shelley, M.: Personal Communication (2007)

    Google Scholar 

  • Tedrake, R., Zhang, T.W., Seung, H.S.: Stochastic policy gradient reinforcement learning on a simple 3D biped. In: Proceedings of the IEEE International Conference on Intelligent Robots and Systems (IROS), Sendai, Japan, pp. 2849–2854 (2004)

    Google Scholar 

  • Vandenberghe, N., Childress, S., Zhang, J.: On unidirectional flight of a free flapping wing. Physics of Fluids, 18 (2006)

    Google Scholar 

  • Vandenberghe, N., Zhang, J., Childress, S.: Symmetry breaking leads to forward flapping flight. Journal of Fluid Mechanics 506, 147–155 (2004)

    Article  MATH  Google Scholar 

  • Williams, J.L., Fisher III, J.W., Willsky, A.S.: Importance sampling actor-critic algorithms. In: Proceedings of the 2006 American Control Conference (2006)

    Google Scholar 

  • Williams, R.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Roberts, J.W., Moret, L., Zhang, J., Tedrake, R. (2010). Motor Learning at Intermediate Reynolds Number: Experiments with Policy Gradient on the Flapping Flight of a Rigid Wing. In: Sigaud, O., Peters, J. (eds) From Motor Learning to Interaction Learning in Robots. Studies in Computational Intelligence, vol 264. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-05181-4_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-05181-4_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05180-7

  • Online ISBN: 978-3-642-05181-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics