Abstract
At present, vision-based hand gesture recognition is very important in human-robot interaction (HRI). This non-contact method enables natural and friendly interaction between people and robots. Aiming at this technology, a two-stream CNN framework (2S-CNN) is proposed to recognize the American sign language (ASL) hand gestures based on multimodal (RGB and depth) data fusion. Firstly, the hand gesture data is enhanced to remove the influence of background and noise. Secondly, hand gesture RGB and depth features are extracted for hand gesture recognition using CNNs on two streams, respectively. Finally, a fusion layer is designed for fusing the recognition results of the two streams. This method utilizes multimodal data to increase the recognition accuracy of the ASL hand gestures. The experiments prove that the recognition accuracy of 2S-CNN can reach 92.08\(\%\) on ASL fingerspelling database and is higher than that of baseline methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Goodrich, M.A., Schultz, A.C.: Human crobot interaction: a survey. Found. Trends in Hum. Comput. Interact. 1(3), 203–275 (2008)
Liu, J., Luo, Y., Ju, Z.: An interactive astronaut-robot system with gesture control. Comput. Intell. Neurosci. 2016, 7845102 (2016)
Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Wang, T., Li, Y., Hu, J., Khan, A., Liu, L., Li, C., Ran, M.: A survey on vision-based hand gesture recognition. In: International Conference on Smart Multimedia, pp. 219-231. August 2018
Oyedotun, O.K., Khashman, A.: Deep learning in vision-based static hand gesture recognition. Neural Comput. Appl. 28(12), 3941–3951 (2017)
Nagi, J., Ducatelle, F., Di Caro, A.G., Cirean, D., Meier, U., Giusti, A., Gambardella, L.M.: Max-pooling convolutional neural networks for vision-based hand gesture recognition. In: Conference on 2011 IEEE International In Signal and Image Processing Applications (ICSIPA), pp. 342-347 (2011)
Kimm, Y., Toomajian, B.: Hand gesture recognition using micro-Doppler signatures with convolutional neural network. IEEE Access 4, 7125–7130 (2016)
Yamashita, T., Watasue, T.: Hand posture recognition based on bottom-up structured deep convolutional neural network with curriculum learning. In: Image Processing (ICIP), 2014 IEEE International Conference on, pp. 853-857 (2014)
Gao, Q., Liu, J., Ju, Z., Li, Y., Zhang, T., Zhang, L.: Static hand gesture recognition with parallel CNNs for space human-robot interaction. In: International Conference on Intelligent Robotics and Applications, pp. 462-473 (2017)
Flores, C.J.L., Cutipa, A.G., Enciso, R.L.: Application of convolutional neural networks for static hand gestures recognition under different invariant features. In: 2017 IEEE XXIV International Conference on Electronics, Electrical Engineering and Computing (INTERCON), pp. 1-4 (2017)
Zhang, Z., Tian, Z., Zhou, M.: HandSense: smart multimodal hand gesture recognition based on deep neural networks. Journal of Ambient Intelligence and Humanized Computing, 1-16 (2018)
Hao, S., Wang, W., Ye, Y., Nie, T., Bruzzone, L.: Two-stream deep architecture for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 56(4), 2349–2361 (2018)
Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimedia 19(2), 4–10 (2012)
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI 4, 12 (2017)
Simonyan, K., Zisserman, A. : Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630-645 (2016)
Pugeault, N., Bowden, R.: Spelling it out: real-time ASL fingerspelling recognition. In: 2011 IEEE International Conference on, Computer Vision Workshops (ICCV Workshops), pp. 1114-1119 (2011)
Kuznetsova, A., Leal-Taix, L. , Rosenhahn, B.: Real-time sign language recognition using a consumer depth camera. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 83-90 (2013)
Keskin, C., Kraç, F., Kara, Y.E., Akarun, L.: Real time hand pose estimation using depth sensors. In Consumer depth cameras for computer vision, 119-137 (2013)
Dong, C., Leu, M.C., Yin, Z.: American sign language alphabet recognition using microsoft Kinect. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 44-52 (2015)
Acknowledgment
Research supported in part by the Key Research Program of the Chinese Academy of Sciences under Grant Y4A3210301, in part by the Research Fund of China Manned Space Engineering under Grant 050102, in part by the Natural Science Foundation of China under Grant 51775541, 51575412, 51575338 and 51575407, in part by the EU Seventh Framework Programme (FP7)-ICT under Grant 611391, in part by the Research Project of State Key Lab of Digital Manufacturing Equipment & Technology of China under Grant DMETKF2017003, in part by National Key R&D Program Projects 2018YFB1304600.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Gao, Q., Ogenyi, U.E., Liu, J., Ju, Z., Liu, H. (2020). A Two-Stream CNN Framework for American Sign Language Recognition Based on Multimodal Data Fusion. In: Ju, Z., Yang, L., Yang, C., Gegov, A., Zhou, D. (eds) Advances in Computational Intelligence Systems. UKCI 2019. Advances in Intelligent Systems and Computing, vol 1043. Springer, Cham. https://doi.org/10.1007/978-3-030-29933-0_9
Download citation
DOI: https://doi.org/10.1007/978-3-030-29933-0_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29932-3
Online ISBN: 978-3-030-29933-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)