Sign Language Recognition Using Convolutional Neural Networks

  • Lionel PigouEmail author
  • Sander Dieleman
  • Pieter-Jan Kindermans
  • Benjamin Schrauwen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8925)


There is an undeniable communication problem between the Deaf community and the hearing majority. Innovations in automatic sign language recognition try to tear down this communication barrier. Our contribution considers a recognition system using the Microsoft Kinect, convolutional neural networks (CNNs) and GPU acceleration. Instead of constructing complex handcrafted features, CNNs are able to automate the process of feature construction. We are able to recognize 20 Italian gestures with high accuracy. The predictive model is able to generalize on users and surroundings not occurring during training with a cross-validation accuracy of 91.7%. Our model achieves a mean Jaccard Index of 0.789 in the ChaLearn 2014 Looking at People gesture spotting competition.


Convolutional neural network Deep learning Gesture recognition Sign language recognition 


  1. 1.
    Bergstra, J., Breuleux, O., Bastien, F., Lamblin, P., Pascanu, R., Desjardins, G., Turian, J., Warde-Farley, D., Bengio, Y.: Theano: a CPU and GPU math expression compiler. In: Proceedings of the Python for Scientific Computing Conference (SciPy), June 2010, oral PresentationGoogle Scholar
  2. 2.
    Chai, X., Li, G., Lin, Y., Xu, Z., Tang, Y., Chen, X., Zhou, M.: Sign Language Recognition and Translation with Kinect (2013). Language Recognition and Translation with Kinect.pdf.
  3. 3.
    Cireşan, D., Meier, U., Schmidhuber, J.: Multi-column deep neural networks for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3642–3649. IEEE (2012)Google Scholar
  4. 4.
    Cooper, H., Ong, E.J., Pugeault, N., Bowden, R.: Sign language recognition using sub-units. The Journal of Machine Learning Research 13(1), 2205–2231 (2012)zbMATHGoogle Scholar
  5. 5.
    Escalera, S., Bar, X., Gonzlez, J., Bautista, M.A., Madadi, M., Reyes, M., Ponce, V., Escalante, H.J., Shotton, J., Guyon, I.: Chalearn looking at people challenge 2014: Dataset and results. In: ECCV Workshop (2014)Google Scholar
  6. 6.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier networks. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics 15, pp. 315–323 (2011).
  7. 7.
    Goodfellow, I.J., Bulatov, Y., Ibarz, J., Arnoud, S., Shet, V.: Multi-digit number recognition from street view imagery using deep convolutional neural networks (2013). arXiv preprint arXiv:1312.6082
  8. 8.
    Goodfellow, I.J., Warde-Farley, D., Lamblin, P., Dumoulin, V., Mirza, M., Pascanu, R., Bergstra, J., Bastien, F., Bengio, Y.: Pylearn2: a machine learning research library (2013). arXiv preprint arXiv:1308.4214.
  9. 9.
    Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors (2012). arXiv preprint arXiv:1207.0580
  10. 10.
    Jarrett, K., Kavukcuoglu, K.: What is the best multi-stage architecture for object recognition?. In: IEEE 12th International Conference on Computer Vision, pp. 2146–2153 (2009).
  11. 11.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Fei-Fei, L.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)Google Scholar
  12. 12.
    Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. Advances in Neural Information, 1–9 (2012).
  13. 13.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proceedings of the IEEE 86(11) (1998)Google Scholar
  14. 14.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)Google Scholar
  15. 15.
    Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28(6), 976–990 (2010)CrossRefGoogle Scholar
  16. 16.
    Sutskever, I., Martens, J., Dahl, G., Hinton, G.: On the importance of initialization and momentum in deep learning. In: Proceedings of the 30th International Conference on Machine Learning (ICML 2013), pp. 1139–1147 (2013)Google Scholar
  17. 17.
    Van Herreweghe, M.: Prelinguaal dove jongeren en nederlands: een syntactisch onderzoek. Universiteit Gent, Faculteit Letteren en Wijsbegeerte (1996)Google Scholar
  18. 18.
    Verschaeren, R.: Automatische herkenning van gebaren met de microsoft kinect (2012)Google Scholar
  19. 19.
    Zaki, M.M., Shaheen, S.I.: Sign language recognition using a combination of new vision based features. Pattern Recognition Letters 32(4), 572–577 (2011)CrossRefGoogle Scholar
  20. 20.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional neural networks (2013). arXiv preprint arXiv:1311.2901

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Lionel Pigou
    • 1
    Email author
  • Sander Dieleman
    • 1
  • Pieter-Jan Kindermans
    • 1
  • Benjamin Schrauwen
    • 1
  1. 1.ELISGhent UniversityGhentBelgium

Personalised recommendations