Training Object Class Detectors from Eye Tracking Data

Papadopoulos, Dim P.; Clarke, Alasdair D. F.; Keller, Frank; Ferrari, Vittorio

doi:10.1007/978-3-319-10602-1_24

Dim P. Papadopoulos¹⁹,
Alasdair D. F. Clarke¹⁹,
Frank Keller¹⁹ &
…
Vittorio Ferrari¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 8693))

Included in the following conference series:

European Conference on Computer Vision

22k Accesses
51 Citations
3 Altmetric

Abstract

Training an object class detector typically requires a large set of images annotated with bounding-boxes, which is expensive and time consuming to create. We propose novel approach to annotate object locations which can substantially reduce annotation time. We first track the eye movements of annotators instructed to find the object and then propose a technique for deriving object bounding-boxes from these fixations. To validate our idea, we collected eye tracking data for the trainval part of 10 object classes of Pascal VOC 2012 (6,270 images, 5 observers). Our technique correctly produces bounding-boxes in 50%of the images, while reducing the total annotation time by factor 6.8× compared to drawing bounding-boxes. Any standard object class detector can be trained on the bounding-boxes predicted by our model. Our large scale eye tracking dataset is available at groups.inf.ed.ac.uk/calvin/eyetrackdataset/ .

Download to read the full chapter text

Chapter PDF

MAM: Transfer Learning for Fully Automatic Video Annotation and Specialized Detector Creation

Towards automatic bounding box annotations from weakly labeled images

Article 23 January 2015

End-to-End Training of Object Class Detectors for Mean Average Precision

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Imagenet large scale visual recognition challenge, ILSVRC (2011), http://www.image-net.org/challenges/LSVRC/2011/index
Alexe, B., Deselaers, T., Ferrari, V.: What is an object? In: CVPR (2010)
Google Scholar
Berg, T., Berg, A., Edwards, J., Mair, M., White, R., Teh, Y., Learned-Miller, E., Forsyth, D.: Names and Faces in the News. In: CVPR (2004)
Google Scholar
Blake, A., Rother, C., Brown, M., Perez, P., Torr, P.: Interactive image segmentation using an adaptive GMMRF model. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3021, pp. 428–441. Springer, Heidelberg (2004)
Chapter Google Scholar
Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Trans. on PAMI 26(9), 1124–1137 (2004)
Article Google Scholar
Brainard, D.H.: The Psychophysics Toolbox. Spatial Vision 10, 433–436 (1997)
Article Google Scholar
Chum, O., Zisserman, A.: An exemplar model for learning object classes. In: CVPR (2007)
Google Scholar
Dalal, N., Triggs, B.: Histogram of Oriented Gradients for human detection. In: CVPR (2005)
Google Scholar
Deselaers, T., Alexe, B., Ferrari, V.: Weakly supervised localization and learning with generic knowledge. IJCV (2012)
Google Scholar
Deselaers, T., Ferrari, V.: Global and efficient self-similarity for object classification and detection. In: CVPR (2010)
Google Scholar
Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)
Chapter Google Scholar
Einhäuser, W., Spain, M., Perona, P.: Objects predict fixations better than early saliency. Journal of Vision 8, 1–26 (2008)
Google Scholar
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge (VOC2012) Results (2012), http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html
Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The pascal visual object classes (voc) challenge. IJCV (2010)
Google Scholar
Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE Trans. on PAMI 32(9) (2010)
Google Scholar
Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)
Google Scholar
Guillaumin, M., Ferrari, V.: Large-scale knowledge transfer for object localization in imagenet. In: CVPR (2012)
Google Scholar
Guillaumin, M., Kuettel, D., Ferrari, V.: ImageNet Auto-annotation with Segmentation Propagation. Tech. rep., ETH Zurich (2013)
Google Scholar
Gupta, A., Davis, L.S.: Beyond nouns: Exploiting prepositions and comparative adjectives for learning visual classifiers. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 16–29. Springer, Heidelberg (2008)
Chapter Google Scholar
Hao, S., Deng, J., Fei-Fei, L.: Crowdsourcing annotations for visual object detection. In: AAAI (2012)
Google Scholar
Harel, J., Koch, C., Perona, P.: Graph-based visual saliency. In: NIPS (2007)
Google Scholar
Henderson, J.: Human gaze control in real-world scene perception. Trends in Cognitive Sciences 7, 498–504 (2003)
Article Google Scholar
Itti, L., Koch, C., Niebur, E.: A model of saliency-based visual attention for rapid scene analysis. IEEE Trans. on PAMI 20(11), 1254–1259 (1998)
Article Google Scholar
Judd, T., Ehinger, K., Durand, F., Torralba, A.: Learning to predict where humans look. In: IEEE International Conference on Computer Vision, ICCV (2009)
Google Scholar
Karthikeyan, S., Jagadeesh, V., Shenoy, R., Eckstein, M., Manjunath, B.: From where and how to whatwe see. In: ICCV (2013)
Google Scholar
Kuettel, D., Ferrari, V.: Figure-ground segmentation by transferring window masks. In: CVPR (2012)
Google Scholar
Ladicky, L., Russell, C., Kohli, P.: Associative hierarchical crfs for object class image segmentation. In: ICCV (2009)
Google Scholar
Leistner, C., Godec, M., Schulter, S., Saffari, A., Bischof, H.: Improving classifiers with weakly-related videos. In: CVPR (2011)
Google Scholar
Levinshtein, A., Stere, A., Kutulakos, K., Fleed, D., Dickinson, S.: Turbopixels: Fast superpixels using geometric flows. IEEE Trans. on PAMI (2009)
Google Scholar
Luo, J., Caputo, B., Ferrari, V.: Who’s doing what: Joint modeling of names and verbs for simultaneous face and pose annotation. In: NIPS (2009)
Google Scholar
Mathe, S., Sminchisescu, C.: Dynamic eye movement datasets and learnt saliency models for visual action recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 842–856. Springer, Heidelberg (2012)
Chapter Google Scholar
Mishra, A., Aloimonos, Y., Fah, C.L.: Active segmentation with fixation. In: ICCV (2009)
Google Scholar
Nuthmann, A., Henderson, J.M.: Object-based attentional selection in scene viewing. Journal of Vision 10(8), 1–19 (2010)
Article Google Scholar
Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: ICCV (2011)
Google Scholar
Platt, J.: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers (1999)
Google Scholar
Prest, A., Leistner, C., Civera, J., Schmid, C., Ferrari, V.: Learning object class detectors from weakly annotated video. In: CVPR (2012)
Google Scholar
Ramanathan, S., Katti, H., Sebe, N., Kankanhalli, M., Chua, T.-S.: An eye fixation database for saliency detection in images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 30–43. Springer, Heidelberg (2010)
Chapter Google Scholar
Rother, C., Kolmogorov, V., Blake, A.: Grabcut: interactive foreground extraction using iterated graph cuts. SIGGRAPH (2004)
Google Scholar
Siva, P., Russell, C., Xiang, T., Agapito, L.: Looking beyond the image: Unsupervised learning for object saliency and detection. In: CVPR (2013)
Google Scholar
Siva, P., Xiang, T.: Weakly supervised object detector learning with model drift detection. In: ICCV (2011)
Google Scholar
Siva, P., Russell, C., Xiang, T.: In defence of negative mining for annotating weakly labelled data. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 594–608. Springer, Heidelberg (2012)
Chapter Google Scholar
Tang, K., Sukthankar, R., Yagnik, J., Fei-Fei, L.: Discriminative segment annotation in weakly labeled video. In: CVPR (2013)
Google Scholar
Vedaldi, A., Fulkerson, B.: VLFeat: An open and portable library of computer vision algorithms (2008)
Google Scholar
Veksler, O., Boykov, Y., Mehrani, P.: Superpixels and supervoxels in an energy optimization framework. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 211–224. Springer, Heidelberg (2010)
Chapter Google Scholar
Vicente, S., Rother, C., Kolmogorov, V.: Object cosegmentation. In: CVPR, pp. 2217–2224 (2011)
Google Scholar
Vig, E., Dorr, M., Cox, D.: Space-variant descriptor sampling for action recognition based on saliency and eye movements. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 84–97. Springer, Heidelberg (2012)
Chapter Google Scholar
Viola, P.A., Platt, J., Zhang, C.: Multiple instance boosting for object detection. In: NIPS (2005)
Google Scholar
Walber, T., Scherp, A., Staab, S.: Can you see it? Two novel eye-tracking-based measures for assigning tags to image regions. In: Li, S., El Saddik, A., Wang, M., Mei, T., Sebe, N., Yan, S., Hong, R., Gurrin, C. (eds.) MMM 2013, Part I. LNCS, vol. 7732, pp. 36–46. Springer, Heidelberg (2013)
Chapter Google Scholar
Wang, J., Cohen, M.: An iterative optimization approach for unified image segmentation and matting. In: ICCV (2005)
Google Scholar
Wang, X., Yang, M., Zhu, S., Lin, Y.: Regionlets for generic object detection. In: ICCV (2013)
Google Scholar
Wolfe, J., Horowitz, T.S.: Visual search. Scholarpedia 3(7), 3325 (2008)
Article Google Scholar
Yun, K., Peng, Y., Samaras, D., Zelinsky, G.J., Berg, T.L.: Studying relationships between human gaze, description, and computer vision. In: CVPR (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Informatics, University of Edinburgh, UK
Dim P. Papadopoulos, Alasdair D. F. Clarke, Frank Keller & Vittorio Ferrari

Authors

Dim P. Papadopoulos
View author publications
You can also search for this author in PubMed Google Scholar
Alasdair D. F. Clarke
View author publications
You can also search for this author in PubMed Google Scholar
Frank Keller
View author publications
You can also search for this author in PubMed Google Scholar
Vittorio Ferrari
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, University of Toronto, 6 King’s College Road, M5H 3S5, Toronto, ON, Canada
David Fleet
Faculty of Electrical Engineering, Department of Cybernetics, Czech Technical University in Prague, Technicka 2, 166 27, Prague 6, Czech Republic
Tomas Pajdla
Max-Planck-Institut für Informatik, Campus E1 4, 66123, Saarbrücken, Germany
Bernt Schiele
ESAT - PSI, iMinds, KU Leuven, Kasteelpark Arenberg 10, Bus 2441, 3001, Leuven, Belgium
Tinne Tuytelaars

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Papadopoulos, D.P., Clarke, A.D.F., Keller, F., Ferrari, V. (2014). Training Object Class Detectors from Eye Tracking Data. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol 8693. Springer, Cham. https://doi.org/10.1007/978-3-319-10602-1_24

Download citation

DOI: https://doi.org/10.1007/978-3-319-10602-1_24
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-10601-4
Online ISBN: 978-3-319-10602-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Training Object Class Detectors from Eye Tracking Data

Abstract

Chapter PDF

Similar content being viewed by others

MAM: Transfer Learning for Fully Automatic Video Annotation and Specialized Detector Creation

Towards automatic bounding box annotations from weakly labeled images

End-to-End Training of Object Class Detectors for Mean Average Precision

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Training Object Class Detectors from Eye Tracking Data

Abstract

Chapter PDF

Similar content being viewed by others

MAM: Transfer Learning for Fully Automatic Video Annotation and Specialized Detector Creation

Towards automatic bounding box annotations from weakly labeled images

End-to-End Training of Object Class Detectors for Mean Average Precision

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation