Abstract
Sign language is the way of communication among deaf and hearing impaired community and consist of a combination of hand movements and facial expressions. Successful efforts in computer vision-based research within the last years paved the path for first automatic sign language recognition systems. However, unresolved challenges, such as cultural differences in the sign languages of the world, lack of the representative databases for model training, relatively small size of the region-of-interest, issues due to occlusion, etc. keep automatic sign language recognition reliability still far from human-level performance, especially for the Russian sign language. To address this issue, we present a framework and an automatic system for one-handed gestures of Russian Sign Language (RSL) recognition. The developed system supports both online and offline modes and is able to recognize 44 classes of RSL one-handed gestures with almost 70% of accuracy. The system is based on color-depth Kinect v2 sensor and trained on TheRuSLan database using a combination of state-of-the-art deep learning approaches. The future research will focus on extracting additional features, expanding the data set, and increasing the amount of recognizable gestures with two-handed gestures. The developed vision-based RSL recognition system is meant as an auxiliary system for deaf and hearing impaired people.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Seymour, M., Tsoeu, M.: A mobile application for South African sign language (SASL) recognition. In: Proceedings of the AFRICON, pp. 1–5 (2015)
Pan, T.Y., Lo, L.Y., Yeh, C.W., Li, J.W., Liu, H.T., Hu, M.C.: Real-time sign language recognition in complex background scene based on a hierarchical clustering classification method. In: IEEE 2nd International Conference Multimedia Big Data, pp. 64–67 (2016)
Jin, C.M., Omar, Z., Jaward, M.H.: A mobile application of American sign language translation via image processing algorithms. In: IEEE Region 10 Symposium, pp. 104–109 (2016)
Li, S.Z., Yu, B., Wu, W., Su, S.Z., Ji, R.R.: Feature learning based on SAE–PCA network for human gesture recognition in RGBD images. Neurocomputing 151, 565–573 (2015)
Just, A., Marcel, S.: A comparative study of two state-of-the-art sequence processing techniques for hand gesture recognition. Comput. Vis. Image Underst. 113(4), 532–543 (2009)
Chen, F.S., Fu, C.M., Huang, C.L.: Hand gesture recognition using a real-time tracking method and hidden Markov models. Image Vis. Comput. 21, 745–758 (2003)
Marcel, S., Bernier, O., Viallet, J.E., Collobert, D.: Hand gesture recognition using input/output hidden Markov models. In: IEEE Automatic Face and Gesture Recognition, pp. 456–461 (2000)
Al-Rousan, M., Assaleh, K., Talaa, A.: Video-based signer-independent Arabic sign language recognition using hidden Markov models. Appl. Soft Comput. 9(3), 990–999 (2009)
Nagi, J., Ducatelle, F., Di Caro, G.A., Cire¸ D., Meier, U., Giusti, A.: Max-pooling convolutional neural networks for vision-based hand gesture recognition. In: IEEE International Conference on Signal and Image Processing Applications, pp. 342–347 (2011)
Huang, J., Zhou, W., Li, H., Li, W.: Sign language recognition using 3D convolutional neural networks. In: IEEE International Conference on Multimedia and Expo, pp. 1–6 (2015)
Pigou, L., Dieleman, S., Kindermans, P.J., Schrauwen, B.: Sign language recognition using convolutional neural networks. In: Workshop at the European Conference on Computer Vision, pp. 572–578 (2015)
Liu, T., Zhou, W., Li, H.: Sign language recognition with long short-term memory. In: International Conference on Image Processing, pp. 2871–2875 (2016)
Thalange, A., Dixit, S.: Cohst and wavelet features based static ASL numbers recognition. Proc. Comput. Sci. 92, 455–460 (2016)
Hartanto, R., Kartikasari, A.: Android based real-time static Indonesian sign language recognition system prototype. In: 8th International Conference on Technology and Electrical Engineering, pp. 1–6 (2016)
Vintimilla, M.G., Alulema, D., Morocho, D., Proano, M., Encalada, F., Granizo, E.: Development and implementation of an application that translates the alphabet and the numbers from 1 to 10 from sign language to text to help hearing impaired by android mobile devices. In: IEEE International Conference on Automatica, pp. 1–5 (2016)
Neiva, D.H., Zanchettin, C.: Gesture recognition: a review focusing on sign language in a mobile context. Expert Syst. Appl. 103, 159–183 (2018)
Ryumin, D., Kagirov, I., Ivanko, D., Axyonov, A., Karpov, A.A.: Automatic detection and recognition of 3D manual gestures for human-machine interaction. Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci. XLII-2/W12, 179–183 (2019)
Shukor, A.Z., Mikon, M.F., Jamaluddin, M.H., bin Ali, F., Asyraf, M.F., Bahar, M.B.: A new data glove approach for Malaysian sign language detection. Proc. Comput. Sci. 76, 60–67 (2015)
Khambadkar, V., Folmer, E.: A tactile-proprioceptive communication aid for users who are deafblind. In: IEEE Haptics Symposium, pp. 239–245 (2014)
Bajpai, D., Porov, U., Srivastav, G., Sachan, N.: Two way wireless data communication and american sign language translator glove for images text and speech display on mobile phone. In: 5th International Conference on Communication Systems and Network Technologies, pp. 578–585 (2015)
Prasuhn, L., Oyamada, Y., Mochizuki, Y., Ishikawa, H.: A HOG-based hand gesture recognition system on a mobile device. IEEE International Conference on Image Processing, pp. 3973–3977 (2014)
Sharma, R.P., Verma, G.K.: Human computer interaction using hand gesture. Proc. Comput. Sci. 54, 721–727 (2015)
Kosmidou, V.E., Hadjileontiadis, L.J.: Sign language recognition using intrinsic-mode sample entropy on sEMG and accelerometer data. IEEE Trans. Biomed. Eng. 56(12), 2879–2890 (2009)
Rao, G.A., Kishore, P.: Selfie video based continuous Indian sign language recognition system. Ain Shams Eng. J. 9(4), 1929–1939 (2018)
Celebi, S., Aydin, A.S., Temiz, T.T., Arici, T.: Gesture recognition using skeleton data with weighted dynamic time warping. In: International Conference on Computer Vision Theory and Application, pp. 620–625 (2013)
Kapuscinski, T.: Using hierarchical temporal memory for vision-based hand shape recognition under large variations in hand’s rotation. In: Rutkowski, L., Scherer, R., Tadeusiewicz, R., Zadeh, L.A., Zurada, J.M. (eds.) Artifical Intelligence and Soft Computing. LNCS, vol. 6114, pp. 272–279. Springer, Berlin, Heidelberg (2010)
Hakkun, R.Y., Baharuddin, A.: Sign language learning based on android for deaf and speech impaired people. In: International Electronics Symposium, pp. 114–117 (2015)
Elons, A., Ahmed, M., Shedid, H.: Facial expressions recognition for Arabic sign language translation. In: 9th International Conference on Computer Engineering & Systems, pp. 330–335 (2014)
Madhuri, Y., Anitha, G., Anburajan, M. Vision-based sign language translation device. In: International Conference Information Communication and Embedded Systems, pp. 565–568 (2013)
Luo, R.C., Wu, Y., Lin, P.: Multimodal information fusion for human-robot interaction. In: 10th Jubilee International Symposium on Applied Computational Intelligence and Informatics, pp. 535–540 (2015)
Almeida, S.G.M., Guimaraes, F.G., RamÃrez, J.A.: Feature extraction in Brazilian sign language recognition based on phonological structure and using RGB-D sensors. Expert Syst. Appl. 41(16), 7259–7271 (2014)
Kau, L.J., Su, W.L., Yu, P.J., Wei, S.J.: A real-time portable sign language translation system. In: 58th International Midwest Symposium on Circuits and Systems, pp. 1–4 (2015)
Paulson, B., Cummings, D., Hammond, T.: Object interaction detection using hand posture cues in an office setting. Int. J. Hum. Comput. Stud. 69(1), 19–29 (2011)
Devi, S., Deb, S.: Low cost tangible glove for translating sign gestures to speech and text in Hindi language. In: 3rd International Conference on Computational Intelligence & Communication Technology, pp. 1–5 (2017)
Pisharady, P.K., Saerbeck, M.: Recent methods and databases in vision-based hand gesture recognition: a review. Comput. Vis. Image Underst. 141, 152–165 (2015)
Pugeault, N., Bowden, R.: Spelling it out: real-time ASL finger-spelling recognition. In: 1st IEEE Workshop on Consumer Depth Cameras for Computer Vision, pp. 1114–1119 (2011)
Kawulok, M., Kawulok, J., Nalepa, J.: Spatial-based skin detection using discriminative skin-presence features. Pattern Recognit. 41, 3–13 (2014)
Kollorz, E., Penne, J., Hornegger, J., Barke, A.: Gesture recognition with a time-of-flight camera. Int. J. Intell. Syst. Technol. Appl. 5, 334–343 (2007)
Pisharady, P.K., Vadakkepat, P., Loh, A.P.: Attention based detection and recognition of hand postures against complex backgrounds. Int. J. Comput. Vis. 101(3), 403–419 (2013)
Chuang, Y., Chen, L., Chen, G.: Saliency-guided improvement for hand posture detection and recognition. Neurocomputing 133, 404–415 (2014)
Zhou, R., Junsong, Y., Zhengyou, Z.: Robust hand gesture recognition based on finger-earth movers distance with a commodity depth camera. In: ACM International Conference on Multimedia, pp. 1093–1096 (2011)
Malgireddy, M.R., Inwogu, I., Govindaraju, V.: A temporal Bayesian model for classifying, detecting and localizing activities in video sequences. In: IEEE Computer Vision and Pattern Recognition, Workshops, pp. 43–48 (2012)
Keskin, C., Kirac, F., Kara, Y., Akarun, L.: Randomized decision forests for static and dynamic hand shape classification. In: IEEE Computer Vision and Pattern Recognition, Workshops, pp. 31–46 (2012)
Simon, F., Helena, M.M., Pushmeet, K., Sebastian, N.: Instructing people for training gestural interactive systems. In: International Conference on Human Factors in Computing Systems, pp. 1737–1746 (2012)
Escalera, S., Gonzalez, J., Baro, X., Reyes, M., Lopes, O., Guyon, I., Athistos, V., Escalante, H.J.: Multi-modal gesture recognition change 2013: dataset and results. In: 15th ACM International Conference on Multimodal Interaction, pp. 445–452 (2013)
Chen, M., Al Regib, G., Juang, B. H.: 6DMG: a new 6D motion gesture database. In: IEEE Computer Vision and Pattern Recognition Workshops, pp. 83–88 (2011)
Liu, L., Shao, L.: Learning discriminative representations from RGB-D video data. Int. J. Artif. Intell. 1493–1500 (2013)
Ryumin, D., Ivanko, D., Axyonov, A., Kagirov, I., Karpov, A., Zelezny, M.: Human-robot interaction with smart shopping trolley using sign language: data collection. In: IEEE International Conference Pervasive Computing and Communications Workshops, pp. 1–6 (2019)
Yale, S., David, D., Randall, D.: Tracking body and hands for gesture recognition: NATOPS aircraft handling signals database. In: IEEE International Conference Automatic Face and Gesture Recognition, pp. 500–506 (2011)
Shen, X.H., Hua, G., Williams, L., Wu, Y.: Dynamic hand gesture recognition: an exemplar-based approach from motion divergence fields. Image Vis. Comput. 30(3), 227–235 (2012)
Kim, T.K., Wong, S.F., Cipolla, R.: Tensor canonical correlation analysis for action classification. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
Zhang, Z.: Microsoft kinect sensor and its effect. IEEE Multimed. 19(2), 4–10 (2012)
Rahman, M.W., Gavrilova, M.L.: Kinect gait skeletal joint feature-based person identification. In: IEEE 16th International Conference on Cognitive Informatics & Cognitive Computing, pp. 423–430 (2017)
Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 511–518 (2001)
Pulli, K., Baksheev, A., Kornyakov, K., Eruhimov, V.: Real-time computer vision with OpenCV. Commun. ACM 55(6), 61–69 (2012)
OpenCV: OpenCV library. https://opencv.org. Accessed 7 Aug 2019
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: International Conference on Computer Vision, pp. 886–893 (2005)
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: inverted residuals and linear bottlenecks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 4510–4520 (2018)
Hochreiter, S., Schmidhuber, L.: Long short-term memory. Neural Comput. 9, 1–32 (1997)
Donahue, J., Hendricks, A.L., Guadarrama, S., Rohrbach. M., Venugopalan, S., Darrell, T., Saenko, K.: Long-term recurrent convolutional networks for visual recognition and description. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Geilman, I.: Russian sign language dictionary. In: vol. 2, St. Petersburg, Prana (2004)
Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004)
Castrillyn, M., Deniz, O., Hernandez, D., Lorenzo, J.: A comparison of face and facial feature detectors based on the Viola-Jones general object detection framework. Mach. Vis. Appl. 22(3), 481–494 (2011)
Freund, Y., Schapire, R.E.: Experiments with a new boosting algorithm. In: Thirteen International Conference on Machine Learning, vol. 96, pp. 148–156 (1996)
Liu, W., Anguelov, D., Erhan, D., Szegedy, C., Reed, S., Fu, C-Y., Berg, A.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds) Computer Vision—ECCV 2016: European Conference on Computer Vision. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: 2nd ACM International Conference on Multimedia, pp. 675–678 (2014)
Caffe: a fast open framework for deep learning (2019). http://caffe.berkeleyvision.org. Accessed 7 Aug 2019
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M., Kudlur, M.: Tensorflow: a system for large-scale machine learning. In: 12th Symposium on Operating Systems Design and Implementation, pp. 265–283 (2016)
TensorFlow: an end-to-end open source machine learning platform. https://www.tensorflow.org. Accessed 7 Aug 2019
Déniz, O., Bueno, G., Salido, J., De la Torre, F.: Face recognition using histograms of oriented gradients. Pattern Recognit. Lett. 32(12), 1598–1603 (2011)
Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27 (2011)
Dlib: toolkit containing machine learning algorithms and tools for creating complex software. https://www.dlib.net. Accessed 7 Aug 2019
King, D.E.: Max-margin object detection (2015). arXiv:1502.00046
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Everingham, M., Van Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: British Machine Vision Conference, pp. 41.1–41.12 (2015)
Yang, S., Luo, P., Loy, C.C., Tang, X.: Wider face: a face detection benchmark. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5525–5533 (2016)
Ng, H.W., Winkler, S.: A data-driven approach to cleaning large face datasets. In: IEEE International Conference on Image Processing, pp. 343–347 (2014)
Huang, J., Rathod, V., Sun, Ch., Zhu, M., Korattikara, A., Fathi, A., Fischer, I., Wojna, Z., Song, Y., Guadarrama, S., Murphy, K.: Speed/accuracy trade-offs for modern convolutional object detectors. In: 30th IEEE Conference on Computer Vision and Pattern Recognition, pp. 3296–3297 (2017)
LabelImg: A graphical image annotation tool. https://github.com/tzutalin/labelImg. Accessed 7 Aug 2019
Chollet, F.: Keras. https://keras.io. Accessed 7 Aug 2019
Acknowledgements
This research was financially supported by the Ministry of Science and Higher Education of Russia, agreement No. 075-15-2019-1295 (reference RFMEFI61618X0095).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Ryumin, D., Ivanko, D., Kagirov, I., Axyonov, A., Karpov, A. (2020). Vision-Based Assistive Systems for Deaf and Hearing Impaired People. In: Favorskaya, M., Jain, L. (eds) Computer Vision in Advanced Control Systems-5. Intelligent Systems Reference Library, vol 175. Springer, Cham. https://doi.org/10.1007/978-3-030-33795-7_7
Download citation
DOI: https://doi.org/10.1007/978-3-030-33795-7_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33794-0
Online ISBN: 978-3-030-33795-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)