MAM: Transfer Learning for Fully Automatic Video Annotation and Specialized Detector Creation

  • Wolfgang FuhlEmail author
  • Nora Castner
  • Lin Zhuang
  • Markus Holzer
  • Wolfgang Rosenstiel
  • Enkelejda Kasneci
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11133)


Accurate point detection on image data is an important task for many applications, such as in robot perception, scene understanding, gaze point regression in eye tracking, head pose estimation, or object outline estimation. In addition, it can be beneficial for various object detection tasks where minimal bounding boxes are searched and the method can be applied to each corner. We propose a novel self training method, Multiple Annotation Maturation (MAM) that enables fully automatic labeling of large amounts of image data. Moreover, MAM produces detectors, which can be used online afterward. We evaluated our algorithm on data from different detection tasks for eye, pupil center (head mounted and remote), and eyelid outline point and compared the performance to the state-of-the-art. The evaluation was done on over 300,000 images, and our method shows outstanding adaptability and robustness. In addition, we contribute a new dataset with more than 16,200 accurate manually-labeled images from the remote eyelid, pupil center, and pupil outline detection. This dataset was recorded in a prototype car interior equipped with all standard tools, posing various challenges to object detection such as reflections, occlusion from steering wheel movement, or large head movements. The data set and library are available for download at


Automatic annotation Detector creation Eyelids Eye detection Training set clustering Pupil detection 


  1. 1.
    Arora, N., Allenby, G.M., Ginter, J.L.: A hierarchical bayes model of primary and secondary demand. Mark. Sci. 17(1), 29–44 (1998)CrossRefGoogle Scholar
  2. 2.
    Bakker, B., Heskes, T.: Task clustering and gating for bayesian multitask learning. J. Mach. Learn. Res. 4(May), 83–99 (2003)zbMATHGoogle Scholar
  3. 3.
    Baxter, J.: A model of inductive bias learning. J. Artif. Int. Res. 12(1), 149–198 (2000). Scholar
  4. 4.
    Bertozzi, M., Broggi, A.: Gold: a parallel real-time stereo vision system for generic obstacle and lane detection. IEEE Trans. Image Process. 7(1), 62–81 (1998)CrossRefGoogle Scholar
  5. 5.
    Braunagel, C., Rosenstiel, W., Kasneci, E.: Ready for take-over? a new driver assistance system for an automated classification of driver take-over readiness. IEEE Intell. Transp. Syst. Mag. 9, 10–22 (2017)CrossRefGoogle Scholar
  6. 6.
    Cao, X., Wang, Z., Yan, P., Li, X.: Transfer learning for pedestrian detection. Neurocomputing 100, 51–57 (2013)CrossRefGoogle Scholar
  7. 7.
    Caruana, R.: Multitask Learning. Learning to Learn, pp. 95–133. Springer, Boston (1998). Scholar
  8. 8.
    Dollar, P., Wojek, C., Schiele, B., Perona, P.: Pedestrian detection: an evaluation of the state of the art. IEEE Trans. Pattern Anal. Mach. Intell. 34(4), 743–761 (2012)CrossRefGoogle Scholar
  9. 9.
    Droege, D., Paulus, D.: Pupil center detection in low resolution images. In: Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, pp. 169–172. ACM (2010)Google Scholar
  10. 10.
    Duchowski, A.T.: Eye Tracking Methodology. Theory and Practice, vol. 328. Springer, London (2007). Scholar
  11. 11.
    Evgeniou, T., Pontil, M.: Regularized multi-task learning. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 109–117. ACM (2004)Google Scholar
  12. 12.
    Farajidavar, N., de Campos, T.E., Kittler, J.: Adaptive transductive transfer machine. In: BMVC (2014)Google Scholar
  13. 13.
    FarajiDavar, N., De Campos, T., Kittler, J., Yan, F.: Transductive transfer learning for action recognition in tennis games. In: 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pp. 1548–1553. IEEE (2011)Google Scholar
  14. 14.
    Fei-Fei, L., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)CrossRefGoogle Scholar
  15. 15.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  16. 16.
    Fuhl, W., et al.: Non-intrusive practitioner pupil detection for unmodified microscope oculars. Comput. Biolo. Med. 79, 36–44 (2016)CrossRefGoogle Scholar
  17. 17.
    Fuhl, W., Geisler, D., Santini, T., Rosenstiel, W., Kasneci, E.: Evaluation of state-of-the-art pupil detection algorithms on remote eye images. In: Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct, pp. 1716–1725. ACM (2016)Google Scholar
  18. 18.
    Fuhl, W., Kübler, T., Sippel, K., Rosenstiel, W., Kasneci, E.: ExCuSe: robust pupil detection in real-world scenarios. In: Azzopardi, G., Petkov, N. (eds.) CAIP 2015. LNCS, vol. 9256, pp. 39–51. Springer, Cham (2015). Scholar
  19. 19.
    Fuhl, W., Santini, T., Geisler, D., Kübler, T., Kasneci, E.: Eyelad: remote eye tracking image labeling tool, 02 2017Google Scholar
  20. 20.
    Fuhl, W., Santini, T., Kasneci, G., Kasneci, E.: Pupilnet: convolutional neural networks for robust pupil detection. CoRR abs/1601.04902 (2016)Google Scholar
  21. 21.
    Fuhl, W., Santini, T.C., Kuebler, T., Kasneci, E.: Else: Ellipse selection for robust pupil detection in real-world environments. In: Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications. ETRA 2016, pp. 123–130. ACM, New York (2016)Google Scholar
  22. 22.
    George, A., Routray, A.: Fast and accurate algorithm for eye localization for gaze tracking in low resolution images. arXiv preprint arXiv:1605.05272 (2016)
  23. 23.
    Gong, Y., Wang, L., Guo, R., Lazebnik, S.: Multi-scale orderless pooling of deep convolutional activation features. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8695, pp. 392–407. Springer, Cham (2014). Scholar
  24. 24.
    Hoffman, et al.: LSDA: large scale detection through adaptation. In: Advances in Neural Information Processing Systems, pp. 3536–3544 (2014)Google Scholar
  25. 25.
    Huang, J., Gretton, A., Borgwardt, K.M., Schölkopf, B., Smola, A.J.: Correcting sample selection bias by unlabeled data. In: Advances in Neural Information Processing Systems, pp. 601–608 (2007)Google Scholar
  26. 26.
    Jain, V., Learned-Miller, E.: Online domain adaptation of a pre-trained cascade of classifiers. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 577–584. IEEE (2011)Google Scholar
  27. 27.
    Jesorsky, O., Kirchberg, K.J., Frischholz, R.W.: Robust face detection using the hausdorff distance. In: Bigun, J., Smeraldi, F. (eds.) AVBPA 2001. LNCS, vol. 2091, pp. 90–95. Springer, Heidelberg (2001). Scholar
  28. 28.
    Kasneci, E., Hardiess, G.: Driving with homonymous visual field defects. In: Skorkovská, K. (ed.) Homonymous Visual Field Defects, pp. 135–144. Springer, Cham (2017). Scholar
  29. 29.
    Kasneci, E., Kuebler, T., Broelemann, K., Kasneci, G.: Aggregating physiological and eye tracking signals to predict perception in the absence of ground truth. Comput. Hum. Behav. 68, 450–455 (2017)CrossRefGoogle Scholar
  30. 30.
    Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1867–1874 (2014)Google Scholar
  31. 31.
    King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10(Jul), 1755–1758 (2009)Google Scholar
  32. 32.
    Kübler, T.C., Rittig, T., Kasneci, E., Ungewiss, J., Krauss, C.: Rendering refraction and reflection of eyeglasses for synthetic eye tracker images. In: Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications. ETRA 2016, pp. 143–146. ACM, New York (2016).,
  33. 33.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  34. 34.
    Long, M., Wang, J., Ding, G., Sun, J., Yu, P.S.: Transfer feature learning with joint distribution adaptation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2200–2207 (2013)Google Scholar
  35. 35.
    Munder, S., Gavrila, D.M.: An experimental study on pedestrian classification. IEEE Trans. Pattern Anal. Mach. Intell. 28(11), 1863–1868 (2006)CrossRefGoogle Scholar
  36. 36.
    Nakajima, C., Pontil, M., Heisele, B., Poggio, T.: Full-body person recognition system. Pattern Recognit. 36(9), 1997–2006 (2003)CrossRefGoogle Scholar
  37. 37.
    Namin, S.T., Najafi, M., Salzmann, M., Petersson, L.: A multi-modal graphical model for scene analysis. In: 2015 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 1006–1013. IEEE (2015)Google Scholar
  38. 38.
    Olshausen, B.A., Field, D.J.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature 381(6583), 607–607 (1996)CrossRefGoogle Scholar
  39. 39.
    Pan, S.J., Tsang, I.W., Kwok, J.T., Yang, Q.: Domain adaptation via transfer component analysis. IEEE Trans. Neural Networks 22(2), 199–210 (2011)CrossRefGoogle Scholar
  40. 40.
    Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2010)CrossRefGoogle Scholar
  41. 41.
    Pinheiro, P., Collobert, R.: Recurrent convolutional neural networks for scene labeling. In: International Conference on Machine Learning, pp. 82–90 (2014)Google Scholar
  42. 42.
    Raina, R., Battle, A., Lee, H., Packer, B., Ng, A.Y.: Self-taught learning: transfer learning from unlabeled data. In: Proceedings of the 24th International Conference on Machine Learning, pp. 759–766. ACM (2007)Google Scholar
  43. 43.
    Raj, A., Namboodiri, V.P., Tuytelaars, T.: Subspace alignment based domain adaptation for RCNN detector. arXiv preprint arXiv:1507.05578 (2015)
  44. 44.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: Advances in Neural Information Processing Systems, pp. 91–99 (2015)Google Scholar
  45. 45.
    Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., Webb, R.: Learning from simulated and unsupervised images through adversarial training. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017Google Scholar
  46. 46.
    Sugiyama, M., Krauledat, M., MÞller, K.R.: Covariate shift adaptation by importance weighted cross validation. J. Mach. Learn. Res. 8(May), 985–1005 (2007)zbMATHGoogle Scholar
  47. 47.
    Sun, Q., Chattopadhyay, R., Panchanathan, S., Ye, J.: A two-stage weighting framework for multi-source domain adaptation. In: Advances in Neural Information Processing Systems, pp. 505–513 (2011)Google Scholar
  48. 48.
    Świrski, L., Dodgson, N.: Rendering synthetic ground truth images for eye tracker evaluation. In: Proceedings of the Symposium on Eye Tracking Research and Applications. ETRA 2014, ACM, New York (2014).,
  49. 49.
    Świrski, L., Bulling, A., Dodgson, N.: Robust real-time pupil tracking in highly off-axis images. In: Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 173–176. ACM (2012)Google Scholar
  50. 50.
    Szegedy, C., et al.: Going deeper with convolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)Google Scholar
  51. 51.
    Tafaj, E., Kasneci, G., Rosenstiel, W., Bogdan, M.: Bayesian online clustering of eye movement data. In: Proceedings of the Symposium on Eye Tracking Research and Applications, pp. 285–288. ACM (2012)Google Scholar
  52. 52.
    Thrun, S., Pratt, L.: Learning to learn. Springer Science & Business Media, New York (2012)zbMATHGoogle Scholar
  53. 53.
    Timm, F., Barth, E.: Accurate eye centre localisation by means of gradients. VISAPP 11, 125–130 (2011)Google Scholar
  54. 54.
    Ullman, S.: High-Level Vision: Object Recognition and Visual Cognition, vol. 2. MIT press, Cambridge (1996)CrossRefGoogle Scholar
  55. 55.
    Villanueva, A., Ponz, V., Sesma-Sanchez, L., Ariz, M., Porta, S., Cabeza, R.: Hybrid method based on topography for robust detection of iris center and eye corners. ACM Trans. Multimedia Comput. Commun. Appl. (TOMM) 9(4), 25–25 (2013)Google Scholar
  56. 56.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001, vol. 1, p. I. IEEE (2001)Google Scholar
  57. 57.
    Wan, J., Ruan, Q., Li, W., Deng, S.: One-shot learning gesture recognition from RGB-D data using bag of features. J. Mach. Learn. Res. 14(1), 2549–2582 (2013)Google Scholar
  58. 58.
    Wold, S., Esbensen, K., Geladi, P.: Principal component analysis. Chemometr. Intell. Lab. Syst. 2(1–3), 37–52 (1987)CrossRefGoogle Scholar
  59. 59.
    Wood, E., Baltrušaitis, T., Morency, L.P., Robinson, P., Bulling, A.: Learning an appearance-based gaze estimator from one million synthesised images. In: Proceedings of the Ninth Biennial ACM Symposium on Eye Tracking Research & Applications. ETRA 2016, pp. 131–138. ACM, New York (2016).,
  60. 60.
    Wu, D., Zhu, F., Shao, L.: One shot learning gesture recognition from RGBD images. In: 2012 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 7–12. IEEE (2012)Google Scholar
  61. 61.
    Xu, J., Ramos, S., Vázquez, D., López, A.M., Ponsa, D.: Incremental domain adaptation of deformable part-based models. In: BMVC (2014)Google Scholar
  62. 62.
    Xu, L., Ren, J.S., Liu, C., Jia, J.: Deep convolutional neural network for image deconvolution. In: Advances in Neural Information Processing Systems, pp. 1790–1798 (2014)Google Scholar
  63. 63.
    Zhang, X., Sugano, Y., Fritz, M., Bulling, A.: It’s written all over your face: full-face appearance-based gaze estimation. CoRR abs/1611.08860 (2016).
  64. 64.
    Zhou, F., Brandt, J., Lin, Z.: Exemplar-based graph matching for robust facial landmark localization. In: IEEE International Conference on Computer Vision (ICCV) (2013)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Wolfgang Fuhl
    • 1
    Email author
  • Nora Castner
    • 1
  • Lin Zhuang
    • 2
  • Markus Holzer
    • 2
  • Wolfgang Rosenstiel
    • 1
  • Enkelejda Kasneci
    • 1
  1. 1.Eberhard Karls UniversityTuebingenGermany
  2. 2.Robert Bosch GmbH, Car MultimediaRenningenGermany

Personalised recommendations