Human-AGV Interaction: Real-Time Gesture Detection Using Deep Learning

Zhang, Jiliang; Peng, Li; Feng, Wei; Ju, Zhaojie; Liu, Honghai

doi:10.1007/978-3-030-27541-9_20

Jiliang Zhang^14,15,
Li Peng¹⁴,
Wei Feng¹⁴,
Zhaojie Ju¹⁵ &
…
Honghai Liu¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11744))

Included in the following conference series:

International Conference on Intelligent Robotics and Applications

2751 Accesses
2 Citations

Abstract

In this paper, we present a real-time human body gesture recognition for controlling Automated Guided Vehicle (AGV) in facility. Exploiting the breakthrough of deep convolutional networks in computers, we have developed a system that can detect the human gestures and give corresponding commands to the AGV according to different gestures. For avoiding interference of multiple operational targets in an image, we proposed a method to filter out the non-operator. In addition, we propose a human gesture interpreter with clear semantic information and build a new human gesture dataset with 8 gestures to train or fine-tune the deep neural networks for human gesture detection. In order to balance accuracy and response speed, we choose MobileNet-SSD as the detection network.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: The IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2014, pp. 580–587 (2014)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Spatial pyramid pooling in deep convolutional networks for visual recognition. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 346–361. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10578-9_23
Chapter Google Scholar
Girshick, R.: Fast R-CNN. In: The IEEE International Conference on Computer Vision, ICCV 2015, pp. 1440–1448 (2015)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1137–1149 (2017)
Article Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: The IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 779–788 (2016)
Google Scholar
Redmon, J., Farhadi, A.: YOLO9000: better, faster, stronger. arXiv preprint arXiv:1612.08242 (2016)
Liu, W., et al.: SSD: Single Shot MultiBox Detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Karen, S., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: The IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2016, pp. 770–778 (2016)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: The IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, pp. 1–9 (2015)
Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Lichtenstern, M., Frassl, M., Perun, B., Angermann, M.: A prototyping environment for interaction between a human and a robotic multi-agent system. In: 7th Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI), Ser. HRI 2012, pp. 185–186. ACM, New York (2012)
Google Scholar
Sanna, A., Lamberti, F., Paravati, G., Manuri, F.: A kinect-based natural interface for quadrotor control. Entertain. Comput. 4(3), 179–186 (2013)
Article Google Scholar
Naseer, T., Sturm, J., Cremers, D.: FollowMe: person following and gesture recognition with a quadrocopter. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 624–630 (2013)
Google Scholar
Monajjemi, M., Mohaimenianpour, S., Vaughan, R.: UAV, come to me: end-to-end, multi-scale situated HRI with an uninstrumented human and a distant UAV. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2016, pp. 4410–4417 (2016)
Google Scholar
Sun, T., Nie, S., Yeung, D.Y., Shen, S.: Gesture-based piloting of an aerial robot using monocular vision. In: IEEE International Conference on Robotics and Automation, ICRA 2017, pp. 5913–5920 (2017)
Google Scholar
Nagi, J., Ngo, H., Gambardella, L.M., Caro, G.A.D.: Wisdom of the swarm for cooperative decision-making in human-swarm interaction. In: IEEE International Conference on Robotics and Automation, ICRA 2015, pp. 1802–1808 (2015)
Google Scholar
Ng, W.S., Sharlin, E.: Collocated interaction with flying robots. In: 20th IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN), pp. 143–149 (2011)
Google Scholar
Taralle, F., Paljic, A., Manitsaris, S., Grenier, J., Guettier, C.: A consensual and non-ambiguous set of gestures to interact with UAV in infantrymen. In: 33rd Annual ACM Conference Extended Abstracts on Human Factors in Computing Systems, Ser. CHI EA 2015, pp. 797–803. ACM, New York (2015)
Google Scholar
Cauchard, J.R., Zhai, K.Y., Landay, J.A.: Drone & me: an exploration into natural human-drone interaction. In: ACM International Joint Conference on Pervasive and Ubiquitous Computing, Ser. UbiComp 2015, pp. 361–365. ACM, New York (2015)
Google Scholar

Download references

Acknowledgements

This research was supported by the 111 Project ( B12018 ) and Jiangsu Planned Projects for Postdoctoral Research Funds ( 1601085C ) . We thank our colleagues from Portsmouth University, England and Jiangnan University, China, who provided insight and expertise that greatly assisted the research.

Author information

Authors and Affiliations

Jiangnan University, Wuxi, 214122, China
Jiliang Zhang, Li Peng & Wei Feng
University of Portsmouth, Portsmouth, PO1 3HE, UK
Jiliang Zhang, Zhaojie Ju & Honghai Liu

Authors

Jiliang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Li Peng
View author publications
You can also search for this author in PubMed Google Scholar
Wei Feng
View author publications
You can also search for this author in PubMed Google Scholar
Zhaojie Ju
View author publications
You can also search for this author in PubMed Google Scholar
Honghai Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Peng .

Editor information

Editors and Affiliations

Shenyang Institute of Automation, Shenyang, China
Haibin Yu
Shenyang Institute of Automation, Shenyang, China
Jinguo Liu
Shenyang Institute of Automation, Shenyang, China
Lianqing Liu
University of Portsmouth, Portsmouth, UK
Zhaojie Ju
Shenyang Institute of Automation, Shenyang, China
Yuwang Liu
University of Portsmouth, Portsmouth, UK
Dalin Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, J., Peng, L., Feng, W., Ju, Z., Liu, H. (2019). Human-AGV Interaction: Real-Time Gesture Detection Using Deep Learning. In: Yu, H., Liu, J., Liu, L., Ju, Z., Liu, Y., Zhou, D. (eds) Intelligent Robotics and Applications. ICIRA 2019. Lecture Notes in Computer Science(), vol 11744. Springer, Cham. https://doi.org/10.1007/978-3-030-27541-9_20

Download citation

DOI: https://doi.org/10.1007/978-3-030-27541-9_20
Published: 06 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-27540-2
Online ISBN: 978-3-030-27541-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics