Abstract
Hand gesture is an efficient mean of human computer interaction. However, hand gesture recognition faces many challenges such as low hand resolution, phase variation and viewpoint. As a result, deployment of hand gesture in a practical application of human machine interaction is still very limited. This work aims to increase performance of hand gestures recognition by using multi-modal streams. We propose a method that combines depth, RGB and optical flow in a unified recognition framework. Each stream will go first into a feature extraction component, which is a deep learning model. We then investigate different fusion techniques to combine features from multi-modal streams for final classification. The proposed method is validated on a dataset of twelve gestures collected by ourselves from five different viewpoints. Experimental results show that accuracy of the proposed method using multi-modal streams outperforms ones that use a single stream, particularly for difficult viewpoints.
This material is based upon work supported by the Air Force Office of Scientific Research under award number FA2386-17-1-4056.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aiolli, F., Da San Martino, G., Sperduti, A.: A kernel method for the optimization of the margin distribution. In: Kůrková, V., Neruda, R., KoutnÃk, J. (eds.) ICANN 2008. LNCS, vol. 5163, pp. 305–314. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-87536-9_32
Dang, M.T., Doan, H.G., Tran, T.H., Le, T.L., Vu, H.: Robustness analysis of 3D convolutional neural network for human hand gesture recognition. Int. J. Mach. Learn. Comput. 9(2), 135–142 (2019)
Devineau, G., Moutarde, F., Xi, W., Yang, J.: Deep learning for hand gesture recognition on skeletal data. In: 2018 13th IEEE International Conference on Automatic Face Gesture Recognition (FG 2018), pp. 106–113 (2018)
Doan, H.G., Nguyen, V.T., Vu, H., Tran, T.H.: A combination of user-guide scheme and kernel descriptor on RGB-D data for robust and realtime hand posture recognition. Eng. Appl. Artif. Intell. 49, 103–113 (2016)
Doan, H.G., Vu, H., Tran, T.H.: Phase synchronization in a manifold space for recognizing dynamic hand gestures from periodic image sequence. In: 2016 IEEE RIVF International Conference on Computing & Communication Technologies, Research, Innovation, and Vision for the Future (RIVF), pp. 163–168. IEEE (2016)
Hoang, N.N., Lee, G.S., Kim, S.H., Yang, H.J.: A real-time multimodal hand gesture recognition via 3D convolutional neural network and key frame extraction. In: Proceedings of the 2018 International Conference on Machine Learning and Machine Intelligence, pp. 32–37. ACM (2018)
Khong, V.M., Tran, T.H.: Improving human action recognition with two-stream 3D convolutional neural network. In: 2018 1st International Conference on Multimedia Analysis and Pattern Recognition (MAPR), pp. 1–6. IEEE (2018)
Köpüklü, O., Gunduz, A., Kose, N., Rigoll, G.: Real-time hand gesture detection and classification using convolutional neural networks. CoRR abs/1901.10323 (2019)
Kurakin, A., Zhang, Z., Liu, Z.: A real time system for dynamic hand gesture recognition with a depth sensor. In: 2012 Proceedings of the 20th European signal processing conference (EUSIPCO), pp. 1975–1979. IEEE (2012)
Liu, K., Zhou, F., Wang, H., Fei, M., Du, D.: Dynamic hand gesture recognition based on the three-dimensional spatial trajectory feature and hidden Markov model. In: Li, K., Fei, M., Du, D., Yang, Z., Yang, D. (eds.) ICSEE/IMIOT -2018. CCIS, vol. 924, pp. 555–564. Springer, Singapore (2018). https://doi.org/10.1007/978-981-13-2384-3_52
Narayana, P., Beveridge, J.R., Draper, B.A.: Gesture recognition: focus on the hands. In: Proceedings of the 2018 International Conference on Pattern Recognition, pp. 5235–5234 (2018)
Ofli, F., Chaudhry, R., Kurillo, G., Vidal, R., Bajcsy, R.: Berkeley MHAD: a comprehensive multimodal human action database. In: 2013 IEEE Workshop on Applications of Computer Vision (WACV), pp. 53–60. IEEE (2013)
Ohn-Bar, E., Trivedi, M.M.: Hand gesture recognition in real time for automotive interfaces: a multimodal vision-based approach and evaluations. IEEE Trans. Intell. Transp. Syst. 15(6), 2368–2377 (2014)
Rautaray, S.S., Agrawal, A.: Vision based hand gesture recognition for human computer interaction: a survey. Artif. Intell. Rev. 43(1), 1–54 (2015)
Ren, Z., Meng, J., Yuan, J.: Depth camera based hand gesture recognition and its applications in human-computer-interaction. In: 2011 8th International Conference on Information, Communications & Signal Processing, pp. 1–5. IEEE (2011)
Ren, Z., Yuan, J., Zhang, Z.: Robust hand gesture recognition based on finger-earth mover’s distance with a commodity depth camera. In: Proceedings of the 19th ACM International Conference on Multimedia, pp. 1093–1096. ACM (2011)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Wu, Q., Wang, Z., Deng, F., Chi, Z., Feng, D.D.: Realistic human action recognition with multimodal feature selection and fusion. IEEE Trans. Syst. Man Cybern. Syst. 43(4), 875–885 (2013)
Zhang, C., Tian, Y.: Edge enhanced depth motion map for dynamic hand gesture recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 500–505, June 2013
Zhu, G., Zhang, L., Shen, P., Song, J.: Multimodal gesture recognition using 3-D convolution and convolutional LSTM. IEEE Access 5, 4517–4524 (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Tran, TH., Tran, HN., Doan, HG. (2019). Dynamic Hand Gesture Recognition from Multi-modal Streams Using Deep Neural Network. In: Chamchong, R., Wong, K. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2019. Lecture Notes in Computer Science(), vol 11909. Springer, Cham. https://doi.org/10.1007/978-3-030-33709-4_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-33709-4_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33708-7
Online ISBN: 978-3-030-33709-4
eBook Packages: Computer ScienceComputer Science (R0)