Towards Automatic Recognition of Sign Language Gestures Using Kinect 2.0
We present a prototype of a new computer system aimed at recognition of manual gestures using Kinect 2.0 for Windows. This sensor allows getting a stream of optical images having FullHD resolution with 30 frames per second (fps) and a depth map of the scene. At present, our system is able to recognize continuous fingerspelling gestures and sequences of digits in Russian and Kazakh sign languages (SL). Our gesture vocabulary contains 52 fingerspelling gestures. We have collected a visual database of SL gestures, which consists of Kinect-based recordings of 2 persons (a man and a woman) demonstrating manual gestures. 5 samples of each gesture were applied for training models and the rest data were used for tuning and testing the developed recognition system. Model of each gesture is presented as a vector of informative visual features, calculated for the hand palm and all fingers. Feature vectors are extracted from both training and test samples of gestures, then comparison of reference patterns (models) and sequences of test vectors is made using the Euclidian distance. Sequences of vectors are compared using the dynamic time warping method (dynamic programming) and a reference pattern with a minimal distance is selected as a recognition result. According to our experiments in the signer-dependent mode with 2 demonstrators from the visual database, the average accuracy of gesture recognition is 87% for 52 manual signs.
KeywordsSign language Assistive technology Automatic gesture recognition Image processing Kinect sensor
This research is partially supported by the Russian Foundation for Basic Research (project No. 16-37-60100), by the Council for Grants of the President of the Russian Federation (project No. MD-254.2017.8), by the state research (№ 0073-2014-0005), as well as by the Government of the Russian Federation (grant No. 074-U01).
- 5.Karpov, A., Krnoul, Z., Zelezny, M., Ronzhin, A.: Multimodal synthesizer for Russian and Czech Sign Languages and Audio-Visual Speech. In: Stephanidis, C., Antona, M. (eds.) UAHCI/HCII 2013. LNCS, vol. 8009, pp. 520–529. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-39188-0_56 CrossRefGoogle Scholar
- 13.Chong, W., Zhong, L., Shing-Chow, C.: Superpixel-based hand gesture recognition with Kinect depth camera. IEEE Trans. Multimed. 1(17), 29–39 (2015)Google Scholar
- 14.Microsoft Developer Network. Skeletal Tracking. https://msdn.microsoft.com/en-us/library/hh973074.aspx
- 15.Sharma, D., Vatta, S.: Optimizing the search in hierarchical database using Quad Tree. Int. J. Sci. Res. Sci. Eng. Technol. 1(4), 221–226 (2015). SpringerGoogle Scholar
- 16.Sreedhar, K., Panlal, B.: Enhancement of images using morphological transformations. Int. J. Comput. Sci. Inf. Technol. 4(1), 33–50 (2012)Google Scholar
- 18.Chaple G., Daruwala R., Gofane, M.: Comparisons of Robert, Prewitt, Sobel operator based edge detection methods for real time uses on FPGA. In: Proceeding International Conference on Technologies for Sustainable Development ICTSD-2015. IEEEXplore (2015)Google Scholar
- 19.Kaehler, A., Bradsky, G.: Learning OpenCV 3. O’Reilly Media, California (2017)Google Scholar
- 20.OpenGL library. https://www.opengl.org
- 21.Kinect for Windows SDK 2.0. https://www.microsoft.com/en-us/download/details.aspx?id=44561
- 24.Sargin, M., Aran, O., Karpov, A., Ofli, F., Yasinnik, Y., Wilson, S., Erzin, E., Yemez, Y., Tekalp, M.: Combined gesture-speech analysis and speech driven gesture synthesis. In: Proceeding IEEE International Conference on Multimedia and Expo ICME-2006, Toronto, Canada. IEEEXplore (2006)Google Scholar
- 26.Karpov, A., Akarun, L., Yalçın, H., Ronzhin, A.L., Demiröz B., Çoban A., Zelezny M.: Audio-visual signal processing in a multimodal assisted living environment. In: Proceeding of 15th International Conference INTERSPEECH-2014, Singapore, pp. 1023–1027 (2014)Google Scholar