Impact of Gaze Analysis on the Design of a Caption Production Software

  • Claude Chapdelaine
  • Samuel Foucher
  • Langis Gagnon
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5616)


Producing caption for the deaf and hearing impaired is a labor intensive task. We implemented a software tool, named SmartCaption, for assisting the caption production process using automatic visual detection techniques aimed at reducing the production workload. This paper presents the results of an eye-tracking analysis made on facial regions of interest to understand the nature of the task, not only to measure of the quantity of data but also to assess its importance to the end-user; the viewer. We also report on two interaction design approaches that were implemented and tested to cope with the inevitable outcomes of automatic detection such as false recognitions and false alarms. These approaches were compared with a Keystoke-Level Model (KLM) showing that the adopted approach allowed a gain of 43% in efficiency.


Caption production eye-tracking analysis facial recognition Keystoke-Level Model (KLM) 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chapdelaine, C., Gouaillier, V., Beaulieu, M., Gagnon, L.: Improving Video Captioning for Deaf and Hearing-impaired People Based on Eye Movement and Attention Overload. In: Proc. of SPIE, vol. 6492, Human Vision & Electronic Imaging XII (2007)Google Scholar
  2. 2.
    CAB: Closed Captioning Standards and Protocol, CAB eds., 66 p (2004)Google Scholar
  3. 3.
    D’Ydewalle, G., Gielen, I.: Attention allocation with overlapping sound, image and text. In: Eyes movements and visual cognition, pp. 415–427. Springer, Heidelberg (1992)CrossRefGoogle Scholar
  4. 4.
    Jensema, C., Sharkawy, S., Danturthi, R.S.: Eye-movement patterns of captioned-television viewers. American Annals of the Deaf 145(3), 275–285 (2000)CrossRefGoogle Scholar
  5. 5.
    Josephson, S.: A Summary of Eye-movement Methodologies (2004),
  6. 6.
    Salvucci, D.D., Golberg, J.H.: Identifying fixations and saccades in eye-tracking protocols. In: Proc. of ETRA, New York, pp. 71–78 (2000)Google Scholar
  7. 7.
    Foucher, S., Gagnon, L.: Automatic Detection and Clustering of Actor Faces based on Spectral Clustering Techniques. In: 4th Canadian Conference on CRV, Montreal (2006)Google Scholar
  8. 8.
    Petrie, H., Fisher, W., Weimann, K., Weber, G.: Augmenting Icons for Deaf Computer Users. In: Dykstra-Erickson, E., Tscheligi, M. (eds.) Proceedings of the 2004 Conference on Human Factors in Computing Systems, pp. 1131–1134. ACM, New York (2004)Google Scholar
  9. 9.
    Viola, P., Jones, M.J.: Rapid object detection using a boosted cascade of simple features. In: IEEE CVPR, 511–518 (2001)Google Scholar
  10. 10.
    Lienhart, E., Maydt, J.: An extended Set of Haar-like Features for Rapid Object Detection. In: IEEE ICME (2002)Google Scholar
  11. 11.
    Verma, R.C., Schmid, C., Mikolajczyk, K.: Face Detection and Tracking in a Video by Propagating Detection Probabilities. IEEE Trans., on PAMI 25(10) (2003)Google Scholar
  12. 12.
    Yang, J., Zhang, D., Frangi, A.F., Yanf, J.: Two- Dimensional PCA: A New Approach to Appearance-Based Face Representation and Recognition. PAMI 26(1), 131–137 (2004)CrossRefGoogle Scholar
  13. 13.
    Kong, H., Li, X., Wang, L., Teoh, E.K., Wang, J.-G., Venkateswarlu, R.: Generalized 2D Principal Component Analysis. In: IJCNN, Montreal, Canada (2005)Google Scholar
  14. 14.
    Zhang, D., Zhou, Z.-H., Chen, S.: Diagonal principal component analysis for face recognition. Pattern Recognition 39(1), 140–142 (2006)CrossRefGoogle Scholar
  15. 15.
    Zuo, W.-M., Wang, K.-Q., Zhang, D.: Assembled Matrix Distance Metric for 2DPCA-Based Face and Palmprint Recognition. In: Proc. of ICMLS, pp. 4870–4875 (2005)Google Scholar
  16. 16.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 91–110 (2004)Google Scholar
  17. 17.
    Chen, X., Yuille, A.L.: Detecting and reading text in natural scenes. In: Proc. CVPR 2004, vol. II, pp. 366–373 (2004)Google Scholar
  18. 18.
    Lucas, B., Kanade, T.: An Iterative Image Registration Technique with an Application to Stereo Vision. In: Proc. of 7th Int. Joint Conference on AI, pp. 674–679 (1981)Google Scholar
  19. 19.
    Sherrah, J.: False alarm rate: a critical performance measure for face recognition, Automatic Face and Gesture Recognition. In: Proceedings Sixth IEEE International Conference on Volume Issue, pp. 189–194 (2004)Google Scholar
  20. 20.
    Phillips, P.J., Scruggs, W.T., O’Toole, A.J., Flynn, P.J., Bowyer, K.W., Schott, C.L., Sharpe, M.: FRVT 2006 and ICE 2006 Large-Scale Results, NISTIR 7408 (2007)Google Scholar
  21. 21.
    Chapdelaine, C., Gouaillier, V., Beaulieu, M., Gagnon, L.: Designing Caption production rules based on face, text and motion detections, vol. 6806. Human Vision & Electronic Imaging XIII (2008)Google Scholar
  22. 22.
    Kieras, D.E.: Using the Keystroke-Level Model to Estimate Execution Times. The University of Michigan (1993),

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Claude Chapdelaine
    • 1
  • Samuel Foucher
    • 1
  • Langis Gagnon
    • 1
  1. 1.R&D DepartmentComputer Research Institute of Montreal (CRIM)MontrealCanada

Personalised recommendations