Skip to main content

Human-AGV Interaction: Real-Time Gesture Detection Using Deep Learning

  • Conference paper
  • First Online:
Intelligent Robotics and Applications (ICIRA 2019)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11744))

Included in the following conference series:

Abstract

In this paper, we present a real-time human body gesture recognition for controlling Automated Guided Vehicle (AGV) in facility. Exploiting the breakthrough of deep convolutional networks in computers, we have developed a system that can detect the human gestures and give corresponding commands to the AGV according to different gestures. For avoiding interference of multiple operational targets in an image, we proposed a method to filter out the non-operator. In addition, we propose a human gesture interpreter with clear semantic information and build a new human gesture dataset with 8 gestures to train or fine-tune the deep neural networks for human gesture detection. In order to balance accuracy and response speed, we choose MobileNet-SSD as the detection network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, pp. 580–587 (2014)

    Google Scholar 

  2. He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23

    Chapter  Google Scholar 

  3. Girshick, R.: Fast R-CNN. In: The IEEE International Conference on Computer Vision, ICCV 2015, pp. 1440–1448 (2015)

    Google Scholar 

  4. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017)

    Article  Google Scholar 

  5. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 779–788 (2016)

    Google Scholar 

  6. Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)

  7. Liu, W., et al.: SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2

    Chapter  Google Scholar 

  8. Karen, S., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  9. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778 (2016)

    Google Scholar 

  10. Szegedy, C., et al.: Going deeper with convolutions. In: The IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp. 1–9 (2015)

    Google Scholar 

  11. Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)

  12. Lichtenstern, M., Frassl, M., Perun, B., Angermann, M.: A prototyping environment for interaction between a human and a robotic multi-agent system. In: 7th Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI), Ser. HRI 2012, pp. 185–186. ACM, New York (2012)

    Google Scholar 

  13. Sanna, A., Lamberti, F., Paravati, G., Manuri, F.: A kinect-based natural interface for quadrotor control. Entertain. Comput. 4(3), 179–186 (2013)

    Article  Google Scholar 

  14. Naseer, T., Sturm, J., Cremers, D.: FollowMe: person following and gesture recognition with a quadrocopter. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 624–630 (2013)

    Google Scholar 

  15. Monajjemi, M., Mohaimenianpour, S., Vaughan, R.: UAV, come to me: end-to-end, multi-scale situated HRI with an uninstrumented human and a distant UAV. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2016, pp. 4410–4417 (2016)

    Google Scholar 

  16. Sun, T., Nie, S., Yeung, D.Y., Shen, S.: Gesture-based piloting of an aerial robot using monocular vision. In: IEEE International Conference on Robotics and Automation, ICRA 2017, pp. 5913–5920 (2017)

    Google Scholar 

  17. Nagi, J., Ngo, H., Gambardella, L.M., Caro, G.A.D.: Wisdom of the swarm for cooperative decision-making in human-swarm interaction. In: IEEE International Conference on Robotics and Automation, ICRA 2015, pp. 1802–1808 (2015)

    Google Scholar 

  18. Ng, W.S., Sharlin, E.: Collocated interaction with flying robots. In: 20th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 143–149 (2011)

    Google Scholar 

  19. Taralle, F., Paljic, A., Manitsaris, S., Grenier, J., Guettier, C.: A consensual and non-ambiguous set of gestures to interact with UAV in infantrymen. In: 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, Ser. CHI EA 2015, pp. 797–803. ACM, New York (2015)

    Google Scholar 

  20. Cauchard, J.R., Zhai, K.Y., Landay, J.A.: Drone & me: an exploration into natural human-drone interaction. In: ACM International Joint Conference on Pervasive and Ubiquitous Computing, Ser. UbiComp 2015, pp. 361–365. ACM, New York (2015)

    Google Scholar 

Download references

Acknowledgements

This research was supported by the 111 Project ( B12018 ) and Jiangsu Planned Projects for Postdoctoral Research Funds ( 1601085C ) . We thank our colleagues from Portsmouth University, England and Jiangnan University, China, who provided insight and expertise that greatly assisted the research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Li Peng .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zhang, J., Peng, L., Feng, W., Ju, Z., Liu, H. (2019). Human-AGV Interaction: Real-Time Gesture Detection Using Deep Learning. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds) Intelligent Robotics and Applications. ICIRA 2019. Lecture Notes in Computer Science(), vol 11744. Springer, Cham. https://doi.org/10.1007/978-3-030-27541-9_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-27541-9_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-27540-2

  • Online ISBN: 978-3-030-27541-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics