Exploration of Ideal Interaction Scheme on Smart TV: Based on User Experience Research of Far-Field Speech and Mid-air Gesture Interaction

  • Xuan LiEmail author
  • Daisong Guan
  • Jingya Zhang
  • Xingtong Liu
  • Siqi Li
  • Hui Tong
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11584)


TV, as an important entertainment appliance in family, its interaction is typical of other screen-equipped devices. Far-field speech and mid-air gesture are the new trends in smart TV interaction. Previous studies have explored the characteristics of far-field speech and mid-air gesture interactions in TV used scenes, but rarely studied the user experience in the interaction process for the two interactive modes and directly compared them. What is the ideal interactive mode for TV in the future when these two can be realized? We know very little about this. Therefore, in Study 1, we quantitatively compared the user experience of far-field speech and mid-air gesture through experiment. The results showed that there were no significant differences between the two interactive modes, and implicated their advantages and disadvantages for different operations. And then, in study 2, we explored the user preference for interaction in different operations. The results showed that there were certain regularities about the participants’ preference for the two interaction channel in different situations. Based on the results mentioned above, we finally proposed the design principles of multichannel interaction fusion for different operations.


Ideal interaction for smart TV Mid-air gesture Far-field speech User experience study Multichannel 


  1. 1.
    Cong-Xian, Z.: Research on speech emotion recognition based on deep learning. A thesis submitted to Southeast University (2016)Google Scholar
  2. 2.
    Zaiţi, I.A., Ştefan-Gheorghe, P., Radu-Daniel, V.: On free-hand TV control: experimental results on user-elicited gestures with Leap Motion. Pers. Ubiquit. Comput. 19(5–6), 821–838 (2015)CrossRefGoogle Scholar
  3. 3.
    Connolly, K.J.: Psychobiology of the Hand. Cambridge University Press, Cambridge (1998)Google Scholar
  4. 4.
    Kopp, S.: The spatial specificity of iconic gestures. In: Proceedings of the 7th International Conference of the German Cognitive Science Society, pp. 112–117 (2005)Google Scholar
  5. 5.
    Shin, Y.K., Choe, J.H.: Remote control interaction for individual environment of smart TV. In: IEEE International Symposium on Personal IEEE Xplore (2011)Google Scholar
  6. 6.
    Reissner, U.: Gestures and speech in cars. In: Electronic Proceedings of Joint Advanced Student School (2007)Google Scholar
  7. 7.
    Angelini, L., Baumgartner, J., Carrino, F., Carrino, S., Caon, M., et al.: Comparing gesture, speech and touch interaction modalities for in-vehicle infotainment systems. In: Actes de la 28ième conférence francophone sur l’Interaction Homme-Machine, October 2016, Fribourg, Switzerland, pp. 188–196 (2016)Google Scholar
  8. 8.
    Pfleging, B., Schneegass, S., Schmidt, A.: Multimodal interaction in the car: combining speech and gestures on the steering wheel. In: International Conference on Automotive User Interfaces & Interactive Vehicular Applications. ACM (2012)Google Scholar
  9. 9.
    Morris, M.R.: Web on the wall: insights from a multimodal interaction elicitation study. In: ACM International Conference on Interactive Tabletops & Surfaces. ACM (2012)Google Scholar
  10. 10.
    Ramis, S., Perales, F.J., Manresa-Yee, C., Bibiloni, A.: Usability study of gestures to control a smart-tv. In: Third Iberoamerican Conference (2014)Google Scholar
  11. 11.
    Panger, G.: Kinect in the kitchen: testing depth camera interactions in practical home environments. In: Chi 12 Extended Abstracts on Human Factors in Computing Systems. ACM (2012)Google Scholar
  12. 12.
    Bobeth, J., Schrammel, J., Deutsch, S., Klein, M., Drobics, M., Hochleitner, C., et al.: Tablet, gestures, remote control?: influence of age on performance and user experience with iTV applications. In: Proceedings of the 2014 ACM International Conference on Interactive Experiences for TV and Online Video - TVX 2014, pp. 139–146 (2014)Google Scholar
  13. 13.
    Brush, A.J., Johns, P., Inkpen, K., Meyers, B.: Speech@home: an exploratory study. In: ACM Conference on Computer-Human Interaction (2011)Google Scholar
  14. 14.
    Ying, C., Wei, Z., Bo, H.: User experience research on voice function of smart TV. Design 22 (2017)Google Scholar
  15. 15.
    Martin, B., Hanington, B.: Universal Methods of Design. Rockport Publishers, Beverly (2012)Google Scholar
  16. 16.
    Rico, J., Brewster, S.: Gesture and voice prototyping for early evaluations of social acceptability in multimodal interfaces. In: International Conference on Multimodal Interfaces. DBLP (2010)Google Scholar
  17. 17.
    Klemmer, S.R., Sinha, A.K., Chen J., Landay, J.A., Aboobaker, N., Wang, A.: Suede: a wizard of Oz prototyping tool for speech user interfaces. In: ACM Symposium on User Interface Software & Technology (2000)Google Scholar
  18. 18.
    Peng, B.: An overview of the researches on technology acceptance model. Res. Libr. Sci. 1, 2–6 (2012)Google Scholar
  19. 19.
    Jaimes, A., Sebe, N.: Multimodal Human-Computer Interaction: A Survey, Computational Vision and Image Understanding, pp. 116–134. Elsevier Science Inc., New York (2007)Google Scholar
  20. 20.
    Xiao-long, S., Hong-ni, G., Feng, H.: Multimodal interaction technology in the holographic display and control interfaces. Packag. Eng. 4, 120–124 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Xuan Li
    • 1
    Email author
  • Daisong Guan
    • 1
  • Jingya Zhang
    • 1
  • Xingtong Liu
    • 1
  • Siqi Li
    • 1
  • Hui Tong
    • 1
  1. 1.Baidu AI Interaction Design LabBeijingChina

Personalised recommendations