A Recognition Method of Hand Gesture Based on Stacked Denoising Autoencoder

  • Miao MaEmail author
  • Ziang Gao
  • Jie Wu
  • Yuli Chen
  • Qingqing Zhu
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 891)


In order to avoid the complex preprocessing, this paper proposes a recognition method based on stacked denoising autoencoder (SDAE), in which the structure and the strategies including the number of hidden units, the number of hidden layers, the level of noise and the regularization are carefully considered and analyzed for American Sign Language Dataset (ASL). Specifically, with the increasing number of hidden units and hidden layers, the optimal structure of SDAE is gradually determined, whose performance is simply measured by the recognition accuracy on testing samples. And then, the influences of the noisy strength and the regularization methods on the performance of the designed SDAE are analyzed and compared. Finally, an effective SDAE network is suggested for ASL Dataset. Experiment results show that, compared with stacked autoencoder (AE), deep belief network (DBN) and convolutional neural network (CNN) etc., the designed SDAE shows a better performance, the accuracy in ASL Dataset is up to 98.07% while the training time is reduced to 1 h.


Gesture recognition Stacked denoising autoencoder Network structure Regularization 



This work is supported by National Natural Science Foundation of China under grants 61501286, 61501287, 61601274 and 61877038, the Natural Science Basic Research Plan in Shaanxi Province of China (2018JM6068), the Fundamental Research Funds for the Central Universities of Shaanxi Normal University (GK201703054 and GK201703058) and The Key Science and Technology Innovation Team in Shaanxi Province of China (2014KTC-18).


  1. 1.
    Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of International Conference on Machine Learning, pp. 1096–1103 (2008).
  2. 2.
    Zhou, X., Zhu, M., Leonardos, S., Daniilidis, K.: Sparse representation for 3D shape estimation: a convex relaxation approach. IEEE Trans. Pattern Anal. Mach. Intell. 39, 1648–1661 (2017). Scholar
  3. 3.
    Zhang, Z., Mei, X., Xiao, B.: Abnormal event detection via compact low-rank sparse learning. IEEE Intell. Syst. 31, 29–36 (2016). Scholar
  4. 4.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  5. 5.
    Pugeault, N., Bowden, R.: Spelling it out: real-time ASL fingerspelling recognition. In: IEEE International Conference on Computer Vision Workshops, pp. 1114–1119 (2011).
  6. 6.
    Estrela, B., Cámara-Chávez, G., Campos, M., Schwartz, W., Nascimento, E.: Sign language recognition using partial least squares and RGB-D information. In: Proceedings of the IX Workshop de Visao Computacional (2013)Google Scholar
  7. 7.
    Pansare, J., Gawande, S., Ingle, M.: Real-time static hand gesture recognition for American Sign Language (ASL) in complex background. J. Signal Inform. Process. 3, 364–367 (2015). Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Miao Ma
    • 1
    • 2
    Email author
  • Ziang Gao
    • 2
  • Jie Wu
    • 2
  • Yuli Chen
    • 2
  • Qingqing Zhu
    • 2
  1. 1.Key Laboratory of Modern Teaching Technology Ministry of EducationXi’anChina
  2. 2.School of Computer ScienceShaanxi Normal UniversityXi’anChina

Personalised recommendations