DrawInAir: A Lightweight Gestural Interface Based on Fingertip Regression

  • Gaurav GargEmail author
  • Srinidhi Hegde
  • Ramakrishna Perla
  • Varun Jain
  • Lovekesh Vig
  • Ramya Hebbalaguppe
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11134)


Hand gestures form a natural way of interaction on Head-Mounted Devices (HMDs) and smartphones. HMDs such as the Microsoft HoloLens and ARCore/ARKit platform enabled smartphones are expensive and are equipped with powerful processors and sensors such as multiple cameras, depth and IR sensors to process hand gestures. To enable mass market reach via inexpensive Augmented Reality (AR) headsets without built-in depth or IR sensors, we propose a real-time, in-air gestural framework that works on monocular RGB input, termed, DrawInAir. DrawInAir uses fingertip for writing in air analogous to a pen on paper. The major challenge in training egocentric gesture recognition models is in obtaining sufficient labeled data for end-to-end learning. Thus, we design a cascade of networks, consisting of a CNN with differentiable spatial to numerical transform (DSNT) layer, for fingertip regression, followed by a Bidirectional Long Short-Term Memory (Bi-LSTM), for a real-time pointing hand gesture classification. We highlight how a model, that is separately trained to regress fingertip in conjunction with a classifier trained on limited classification data, would perform better over end-to-end models. We also propose a dataset of 10 egocentric pointing gestures designed for AR applications for testing our model. We show that the framework takes 1.73 s to run end-to-end and has a low memory footprint of 14 MB while achieving an accuracy of 88.0% on egocentric video dataset.


Egocentric gestures Coordinate regression Augmented reality 


  1. 1.
    Hegde, S., Perla, R., Hebbalaguppe, R., Hassan, E.: Gestar: real time gesture interaction for AR with egocentric view. In: International Symposium on Mixed and Augmented Reality. IEEE (2016)Google Scholar
  2. 2.
    Hürst, W., Van Wezel, C.: Gesture-based interaction via finger tracking for mobile augmented reality. Multimed. Tools Appl. 62(1), 233–258 (2013)CrossRefGoogle Scholar
  3. 3.
    Waldherr, S., Romero, R., Thrun, S.: A gesture based interface for human-robot interaction. Auton. Robots 9(2), 151–173 (2000)CrossRefGoogle Scholar
  4. 4.
    Yang, C., Han, D.K., Ko, H.: Continuous hand gesture recognition based on trajectory shape information. Pattern Recogn. Lett. 99(1), 39–47 (2017)CrossRefGoogle Scholar
  5. 5.
    Wobbrock, J.O., Wilson, A.D., Li, Y.: Gestures without libraries, toolkits or training: a \(\$1\) recognizer for user interface prototypes. In: Proceedings of the 20th Annual ACM Symposium on User Interface Software and Technology, pp. 159–168. ACM (2007)Google Scholar
  6. 6.
    Freeman, W.T.: Dynamic and static hand gesture recognition through low-level image analysis. US Patent 5,454,043, 26 Sept 1995Google Scholar
  7. 7.
    Liu, K., Kehtarnavaz, N.: Real-time robust vision-based hand gesture recognition using stereo images. J. Real-Time Image Process. 11(1), 201–209 (2016)CrossRefGoogle Scholar
  8. 8.
    Huang, Y., Liu, X., Jin, L., Zhang, X.: Deepfinger: a cascade convolutional neuron network approach to finger key point detection in egocentric vision with mobile camera. In: IEEE International Conference on Systems, Man, and Cybernetics (SMC), pp. 2944–2949. IEEE (2015)Google Scholar
  9. 9.
    Tompson, J., Stein, M., Lecun, Y., Perlin, K.: Real-time continuous pose recovery of human hands using convolutional networks. ACM Trans. Graph. (ToG) 33(5), 169 (2014)CrossRefGoogle Scholar
  10. 10.
    Liu, K., Kehtarnavaz, N., Carlsohn, M.: Comparison of two real-time hand gesture recognition systems involving stereo cameras, depth camera, and inertial sensor. In: SPIE Photonics Europe, International Society for Optics and Photonics, paper no. 91390C (2014)Google Scholar
  11. 11.
    Dardas, N.H., Georganas, N.D.: Real-time hand gesture detection and recognition using bag-of-features and support vector machine techniques. IEEE Trans. Instrum. Meas. 60(11), 3592–3607 (2011)CrossRefGoogle Scholar
  12. 12.
    Keskin, C., Kıraç, F., Kara, Y.E., Akarun, L.: Hand pose estimation and hand shape classification using multi-layered randomized decision forests. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 852–863. Springer, Heidelberg (2012). Scholar
  13. 13.
    Mohatta, S., Perla, R., Gupta, G., Hassan, E., Hebbalaguppe, R.: Robust hand gestural interaction for smartphone based AR/VR applications. In: IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 330–335. IEEE (2017)Google Scholar
  14. 14.
    Long Jr, A.C., Landay, J.A., Rowe, L.A.: Implications for a gesture design tool. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 40–47. ACM (1999)Google Scholar
  15. 15.
    Cao, C., Zhang, Y., Wu, Y., Lu, H., Cheng, J.: Egocentric gesture recognition using recurrent 3D convolutional neural networks with spatiotemporal transformer modules. In: The IEEE International Conference on Computer Vision (ICCV) (2017)Google Scholar
  16. 16.
    Sridhar, S., Oulasvirta, A., Theobalt, C.: Interactive markerless articulated hand motion tracking using RGB and depth data. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2456–2463 (2013)Google Scholar
  17. 17.
    Jang, Y., Noh, S.T., Chang, H.J., Kim, T.K., Woo, W.: 3D finger cape: clicking action and position estimation under self-occlusions in egocentric viewpoint. IEEE Trans. Visual. Comput. Graphics 21(4), 501–510 (2015)CrossRefGoogle Scholar
  18. 18.
    Nibali, A., He, Z., Morgan, S., Prendergast, L.: Numerical coordinate regression with convolutional neural networks. arXiv preprint arXiv:1801.07372 (2018)
  19. 19.
    Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1653–1660 (2014)Google Scholar
  20. 20.
    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRefGoogle Scholar
  21. 21.
    Tsironi, E., Barros, P., Wermter, S.: Gesture recognition with a convolutional long short-term memory recurrent neural network. In: Proceedings of the European Symposium on Artificial Neural Networks Computational Intelligence and Machine Learning (ESANN), pp. 213–218 (2016)Google Scholar
  22. 22.
    Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)CrossRefGoogle Scholar
  23. 23.
    Savitzky, A., Golay, M.J.: Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36(8), 1627–1639 (1964)CrossRefGoogle Scholar
  24. 24.
    Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9(8), 1871–1874 (2008)zbMATHGoogle Scholar
  25. 25.
    Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. \({\rm ArXiv}\) e-prints arXiv:1411.4389, November 2014
  26. 26.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. \({\rm ArXiv}\) e-prints arXiv:1412.0767, December 2014
  27. 27.
    Sharma, S., Kiros, R., Salakhutdinov, R.: Action recognition using visual attention. \({\rm ArXiv}\) e-prints arXiv:1511.04119 (2015)

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Gaurav Garg
    • 1
    Email author
  • Srinidhi Hegde
    • 1
  • Ramakrishna Perla
    • 1
  • Varun Jain
    • 1
  • Lovekesh Vig
    • 1
  • Ramya Hebbalaguppe
    • 1
  1. 1.TCS ResearchGurgaonIndia

Personalised recommendations