A Two-Stream CNN Framework for American Sign Language Recognition Based on Multimodal Data Fusion

Gao, Qing; Ogenyi, Uchenna Emeoha; Liu, Jinguo; Ju, Zhaojie; Liu, Honghai

doi:10.1007/978-3-030-29933-0_9

Qing Gao^19,20,21,
Uchenna Emeoha Ogenyi²²,
Jinguo Liu^19,20,
Zhaojie Ju^19,20,22 &
…
Honghai Liu²²

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1043))

Included in the following conference series:

UK Workshop on Computational Intelligence

1013 Accesses
3 Citations

Abstract

At present, vision-based hand gesture recognition is very important in human-robot interaction (HRI). This non-contact method enables natural and friendly interaction between people and robots. Aiming at this technology, a two-stream CNN framework (2S-CNN) is proposed to recognize the American sign language (ASL) hand gestures based on multimodal (RGB and depth) data fusion. Firstly, the hand gesture data is enhanced to remove the influence of background and noise. Secondly, hand gesture RGB and depth features are extracted for hand gesture recognition using CNNs on two streams, respectively. Finally, a fusion layer is designed for fusing the recognition results of the two streams. This method utilizes multimodal data to increase the recognition accuracy of the ASL hand gestures. The experiments prove that the recognition accuracy of 2S-CNN can reach 92.08\(\%\) on ASL fingerspelling database and is higher than that of baseline methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Goodrich, M.A., Schultz, A.C.: Human crobot interaction: a survey. Found. Trends in Hum. Comput. Interact. 1(3), 203–275 (2008)
Article Google Scholar
Liu, J., Luo, Y., Ju, Z.: An interactive astronaut-robot system with gesture control. Comput. Intell. Neurosci. 2016, 7845102 (2016)
Google Scholar
Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436 (2015)
Article Google Scholar
Wang, T., Li, Y., Hu, J., Khan, A., Liu, L., Li, C., Ran, M.: A survey on vision-based hand gesture recognition. In: International Conference on Smart Multimedia, pp. 219-231. August 2018
Google Scholar
Oyedotun, O.K., Khashman, A.: Deep learning in vision-based static hand gesture recognition. Neural Comput. Appl. 28(12), 3941–3951 (2017)
Article Google Scholar
Nagi, J., Ducatelle, F., Di Caro, A.G., Cirean, D., Meier, U., Giusti, A., Gambardella, L.M.: Max-pooling convolutional neural networks for vision-based hand gesture recognition. In: Conference on 2011 IEEE International In Signal and Image Processing Applications (ICSIPA), pp. 342-347 (2011)
Google Scholar
Kimm, Y., Toomajian, B.: Hand gesture recognition using micro-Doppler signatures with convolutional neural network. IEEE Access 4, 7125–7130 (2016)
Article Google Scholar
Yamashita, T., Watasue, T.: Hand posture recognition based on bottom-up structured deep convolutional neural network with curriculum learning. In: Image Processing (ICIP), 2014 IEEE International Conference on, pp. 853-857 (2014)
Google Scholar
Gao, Q., Liu, J., Ju, Z., Li, Y., Zhang, T., Zhang, L.: Static hand gesture recognition with parallel CNNs for space human-robot interaction. In: International Conference on Intelligent Robotics and Applications, pp. 462-473 (2017)
Chapter Google Scholar
Flores, C.J.L., Cutipa, A.G., Enciso, R.L.: Application of convolutional neural networks for static hand gestures recognition under different invariant features. In: 2017 IEEE XXIV International Conference on Electronics, Electrical Engineering and Computing (INTERCON), pp. 1-4 (2017)
Google Scholar
Zhang, Z., Tian, Z., Zhou, M.: HandSense: smart multimodal hand gesture recognition based on deep neural networks. Journal of Ambient Intelligence and Humanized Computing, 1-16 (2018)
Google Scholar
Hao, S., Wang, W., Ye, Y., Nie, T., Bruzzone, L.: Two-stream deep architecture for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 56(4), 2349–2361 (2018)
Article Google Scholar
Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimedia 19(2), 4–10 (2012)
Article Google Scholar
Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet and the impact of residual connections on learning. AAAI 4, 12 (2017)
Google Scholar
Simonyan, K., Zisserman, A. : Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Identity mappings in deep residual networks. In: European Conference on Computer Vision, pp. 630-645 (2016)
Chapter Google Scholar
Pugeault, N., Bowden, R.: Spelling it out: real-time ASL fingerspelling recognition. In: 2011 IEEE International Conference on, Computer Vision Workshops (ICCV Workshops), pp. 1114-1119 (2011)
Google Scholar
Kuznetsova, A., Leal-Taix, L. , Rosenhahn, B.: Real-time sign language recognition using a consumer depth camera. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 83-90 (2013)
Google Scholar
Keskin, C., Kraç, F., Kara, Y.E., Akarun, L.: Real time hand pose estimation using depth sensors. In Consumer depth cameras for computer vision, 119-137 (2013)
Chapter Google Scholar
Dong, C., Leu, M.C., Yin, Z.: American sign language alphabet recognition using microsoft Kinect. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 44-52 (2015)
Google Scholar

Download references

Acknowledgment

Research supported in part by the Key Research Program of the Chinese Academy of Sciences under Grant Y4A3210301, in part by the Research Fund of China Manned Space Engineering under Grant 050102, in part by the Natural Science Foundation of China under Grant 51775541, 51575412, 51575338 and 51575407, in part by the EU Seventh Framework Programme (FP7)-ICT under Grant 611391, in part by the Research Project of State Key Lab of Digital Manufacturing Equipment & Technology of China under Grant DMETKF2017003, in part by National Key R&D Program Projects 2018YFB1304600.

Author information

Authors and Affiliations

State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, 110016, China
Qing Gao, Jinguo Liu & Zhaojie Ju
Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, 110169, China
Qing Gao, Jinguo Liu & Zhaojie Ju
University of Chinese Academy of Sciences, Beijing, 100049, China
Qing Gao
School of Computing, University of Portsmouth, Portsmouth, PO1 3HE, UK
Uchenna Emeoha Ogenyi, Zhaojie Ju & Honghai Liu

Authors

Qing Gao
View author publications
You can also search for this author in PubMed Google Scholar
Uchenna Emeoha Ogenyi
View author publications
You can also search for this author in PubMed Google Scholar
Jinguo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhaojie Ju
View author publications
You can also search for this author in PubMed Google Scholar
Honghai Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinguo Liu .

Editor information

Editors and Affiliations

School of Computing, University of Portsmouth, Portsmouth, UK
Zhaojie Ju
Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, UK
Longzhi Yang
Bristol Robotics Laboratory, University of the West of England, Bristol, UK
Chenguang Yang
School of Computing, University of Portsmouth, Portsmouth, Hampshire, UK
Alexander Gegov
School of Computing, University of Portsmouth, Portsmouth, UK
Dalin Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gao, Q., Ogenyi, U.E., Liu, J., Ju, Z., Liu, H. (2020). A Two-Stream CNN Framework for American Sign Language Recognition Based on Multimodal Data Fusion. In: Ju, Z., Yang, L., Yang, C., Gegov, A., Zhou, D. (eds) Advances in Computational Intelligence Systems. UKCI 2019. Advances in Intelligent Systems and Computing, vol 1043. Springer, Cham. https://doi.org/10.1007/978-3-030-29933-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-030-29933-0_9
Published: 30 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29932-3
Online ISBN: 978-3-030-29933-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics