Advertisement

Multimedia Tools and Applications

, Volume 55, Issue 1, pp 27–52 | Cite as

Reliability and effectiveness of clickthrough data for automatic image annotation

  • Theodora Tsikrika
  • Christos Diou
  • Arjen P. de Vries
  • Anastasios Delopoulos
Article

Abstract

Automatic image annotation using supervised learning is performed by concept classifiers trained on labelled example images. This work proposes the use of clickthrough data collected from search logs as a source for the automatic generation of concept training data, thus avoiding the expensive manual annotation effort. We investigate and evaluate this approach using a collection of 97,628 photographic images. The results indicate that the contribution of search log based training data is positive despite their inherent noise; in particular, the combination of manual and automatically generated training data outperforms the use of manual data alone. It is therefore possible to use clickthrough data to perform large-scale image annotation with little manual annotation effort or, depending on performance, using only the automatically generated training data. An extensive presentation of the experimental results and the accompanying data can be accessed at http://olympus.ee.auth.gr/~diou/civr2009/.

Keywords

Image annotation Concepts Supervised learning Search logs Clickthrough data Collective knowledge Implicit feedback Reliability 

Notes

Acknowledgements

The authors are grateful to the Belga press agency for providing the images and search logs used in this work and to Marco Palomino from the University of Sunderland for the extraction of the text features used. This work was supported by the EU-funded VITALAS project (FP6-045389). Christos Diou is supported by the Greek State Scholarships Foundation (http://www.iky.gr).

References

  1. 1.
    Ashman H, Antunovic M, Donner C, Frith R, Rebelos E, Schmakeit JF, Smith G, Truran M (2009) Are clickthroughs useful for image labelling? In: Pasi G, Bordogna G, Mauri G, Baeza-Yates R (eds) Proceedings of the 2009 IEEE/WIC/ACM international conference on web intelligence (WI 2009), pp 191–197Google Scholar
  2. 2.
    Ayache S, Quénot G (2008) Video corpus annotation using active learning. In: Boughanem M, Berrut C, Mothe J, Soulé-Dupuy C (eds) Proceedings of the 30th European conference on IR research, pp 187–198Google Scholar
  3. 3.
    Baeza-Yates RA, Hurtado CA, Mendoza M (2007) Improving search engines by query clustering. J Am Soc Inf Sci Technol 58(12):1793–1804CrossRefGoogle Scholar
  4. 4.
    Chang CC, Lin CJ (2001) Libsvm: a library for support vector machines. Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
  5. 5.
    Chang SF, He J, Jiang YG, El Khoury E, Ngo CW, Yanagawa A, Zavesky E (2008) Columbia University/VIREO-CityU/IRIT TRECVID2008 high-level feature extraction and interactive video search. In: Proceedings of TRECVID 2008Google Scholar
  6. 6.
    Chua TS, Tang J, Hong R, Li H, Luo H, Zheng YT (2009) NUS-WIDE: A real-world Web image database from National University of Singapore. In: Marchand-Maillet S, Kompatsiaris Y (eds) Proceedings of the 8th international conference on content-based image and video retrieval (CIVR 2009). ACM PressGoogle Scholar
  7. 7.
    Craswell N, Szummer M (2007) Random walks on the click graph. In: Proceedings of the 30th ACM SIGIR conference on research and development in information retrieval, pp 239–246Google Scholar
  8. 8.
    Faymonville P, Wang K, Miller J, Belongie SJ (2009) CAPTCHA-based image labeling on the Soylent Grid. In: Bennett PN, Chandrasekar R, Chickering M, Ipeirotis PG, Law E, Mityagin A, Provost FJ, von Ahn L (eds) (2009) Proceedings of the ACM SIGKDD workshop on human computation. ACM Press, pp 46–49Google Scholar
  9. 9.
    Fox S, Karnawat K, Mydland M, Dumais ST, White T (2005) Evaluating implicit measures to improve web search. ACM Trans Inf Syst 23(2):147–168CrossRefGoogle Scholar
  10. 10.
    van Gemert JC, Geusebroek JM, Veenman CJ, Snoek CGM, Smeulders AWM (2006) Robust scene categorization by learning image statistics in context. In: International workshop on semantic learning applications in multimedia, p 105Google Scholar
  11. 11.
    Hauptmann A, Yan R, Lin WH (2007) How many high-level concepts will fill the semantic gap in news video retrieval? In: Sebe N, Worring M (eds) Proceedings of the 6th international conference on content-based image and video retrieval (CIVR 2007). ACM Press, pp 627–634Google Scholar
  12. 12.
    Hiemstra D (1998) A linguistically motivated probabilistic model of information retrieval. In: Proceedings of the 2nd European conference on research and advanced technology for digital libraries (ECDL 1998), pp 569–584Google Scholar
  13. 13.
    Hiemstra D, Rode H, van Os R, Flokstra J (2006) PF/Tijah: text search in an XML database system. In: Proceedings of the 2nd international workshop on open source information retrieval (OSIR 2006), pp 12–17Google Scholar
  14. 14.
    Ho CJ, Chang TH, Lee JC, Hsu JYJ, Chen KT (2009) KissKissBan: a competitive human computation game for image annotation. In: Bennett PN, Chandrasekar R, Chickering M, Ipeirotis PG, Law E, Mityagin A, Provost FJ, von Ahn L (eds) (2009) Proceedings of the ACM SIGKDD workshop on human computation. ACM Press, pp 11–14Google Scholar
  15. 15.
    Joachims T (2002) Optimizing search engines using clickthrough data. In: Proceedings of the 8th annual international ACM SIGKDD conference on knowledge discovery and data mining, pp 133–142Google Scholar
  16. 16.
    Joachims T, Granka L, Pan B, Hembrooke H, Radlinski F, Gay G (2007) Evaluating the accuracy of implicit feedback from clicks and query reformulations in Web search. ACM Trans Inf Syst 25(2). doi: 10.1145/1229179.1229181 Google Scholar
  17. 17.
    Joachims T, Li H, Liu TY, Zhai C (2007) Learning to rank for information retrieval (lr4ir 2007). SIGIR Forum 41(2):58–62CrossRefGoogle Scholar
  18. 18.
    Kelly D, Teevan J (2003) Implicit feedback for inferring user preference: a bibliography. SIGIR Forum 37(2):18–28CrossRefGoogle Scholar
  19. 19.
    Li X, Snoek CGM (2009) Visual categorization with negative examples for free. In: Gao W, Rui Y, Hanjalic A, Xu C, Steinbach EG, El-Saddik A, Zhou MX (eds) Proceedings of the 17th international conference on multimedia. ACM Press, pp 661–664Google Scholar
  20. 20.
    LSCOM Lexicon definitions and annotations version 1.0. Tech. rep., Columbia University (2006)Google Scholar
  21. 21.
    Macdonald C, Ounis I (2009) Usefulness of quality click-through data for training. In: Craswell N, Jones R, Dupret G, Viegas E (eds) Proceedings of the 2009 workshop on Web search click data (WSCD 2009). ACM, New York, pp 75–79CrossRefGoogle Scholar
  22. 22.
    Morrison D, Marchand-Maillet S, Bruno E (2009) TagCaptcha: annotating images with CAPTCHAs. In: Bennett PN, Chandrasekar R, Chickering M, Ipeirotis PG, Law E, Mityagin A, Provost FJ, von Ahn L (eds) Proceedings of the ACM SIGKDD workshop on human computation. ACM Press, pp 44–45Google Scholar
  23. 23.
    Palomino MA, Oakes MP, Wuytack T (2009) Automatic extraction of keywords for a multimedia search engine using the chi-square test. In: Proceedings of the 9th Dutch–Belgian information retrieval workshop (DIR 2009), pp 3–10Google Scholar
  24. 24.
    Poblete B, Baeza-Yates RA (2008) Query-sets: using implicit feedback and query patterns to organize Web documents. In: Huai J, Chen R, Hon HW, Liu Y, Ma WY, Tomkins A, Zhang X (eds) Proceedings of the 17th international conference on World Wide Web, pp 41–50Google Scholar
  25. 25.
    Scholer F, Shokouhi M, Billerbeck B, Turpin A (2008) Using clicks as implicit judgments: expectations versus observations. In: Boughanem M, Berrut C, Mothe J, Soulé-Dupuy C (eds) Proceedings of the 30th European conference on IR research, pp 28–39Google Scholar
  26. 26.
    Setz AT, Snoek CGM (2009) Can social tagged images aid concept-based video search? In: Proceedings of the IEEE international conference on multimedia & expo (ICME 2009), pp 1460–1463Google Scholar
  27. 27.
    Sigurbjörnsson B, van Zwol R (2008) Flickr tag recommendation based on collective knowledge. In: Huai J, Chen R, Hon HW, Liu Y, Ma WY, Tomkins A, Zhang X (eds) Proceedings of the 17th international conference on World Wide Web, pp 327–336Google Scholar
  28. 28.
    Smith G, Ashman H (2009) Evaluating implicit judgements from image search interactions. In: Proceedings of the Web science conference: society on-line (WebSci 2009)Google Scholar
  29. 29.
    Snoek CGM, Worring M, van Gemert JC, Geusebroek JM, Smeulders AWM (2004) The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of the 14th ACM international conference on multimedia, pp 421–430Google Scholar
  30. 30.
    Tsikrika T, Diou C, de Vries AP, Delopoulos A (2009) Image annotation using clickthrough data. In: Marchand-Maillet S, Kompatsiaris Y (eds) Proceedings of the 8th international conference on content-based image and video retrieval (CIVR 2009). ACM PressGoogle Scholar
  31. 31.
    Ulges A, Koch M, Schulze C, Breuel T (2008) Learning TRECVID’08 high-level features from YouTubeTM. In: Proceedings of TRECVID 2008Google Scholar
  32. 32.
    Ulges A, Schulze C, Keysers D, Breuel TM (2008) Identifying relevant frames in weakly labeled videos for training concept detectors. In: Luo J, Guan L, Hanjalic A, Kankanhalli MS, Lee I (eds) Proceedings of the 7th international conference on content-based image and video retrieval (CIVR 2008). ACM Press, pp 9–16Google Scholar
  33. 33.
    Ulges A, Schulze C, Keysers D, Breuel TM (2008) A system that learns to tag videos by watching YouTube. In: Gasteratos A, Vincze M, Tsotsos JK (eds) Proceedings of the 6th international conference of computer vision systems (ICVS 2008). Lecture Notes in Computer Science, vol 5008. Springer, pp 415–424Google Scholar
  34. 34.
    von Ahn L, Blum M, Langford J (2004) Telling humans and computers apart automatically. Commun ACM 47(2):56–60CrossRefGoogle Scholar
  35. 35.
    von Ahn L, Dabbish L (2004) Labeling images with a computer game. In: Proceedings of the ACM SIGCHI conference on human factors in computing systems (CHI 2004). ACM Press, pp 319–326Google Scholar
  36. 36.
    von Ahn L, Dabbish L (2008) Designing games with a purpose. Commun ACM 51(8):58–67Google Scholar
  37. 37.
    von Ahn L, Liu R, Blum M (2006) Peekaboom: a game for locating objects in images. In: Grinter RE, Rodden T, Aoki PM, Cutrell E, Jeffries R, Olson GM (eds) Proceedings of the ACM SIGCHI conference on human factors in computing systems (CHI 2006). ACM Press, pp 55–64Google Scholar
  38. 38.
    von Ahn L, Maurer B, Mcmillen C, Abraham D, Blum M (2008) reCAPTCHA: Human-based character recognition via web security measures. Science 321(5895):1465–1468MathSciNetCrossRefGoogle Scholar
  39. 39.
    Yang J, Hauptmann AG (2008) (Un)Reliability of video concept detection. In: Luo J, Guan L, Hanjalic A, Kankanhalli MS, Lee I (eds) Proceedings of the 7th international conference on content-based image and video retrieval (CIVR 2008). ACM Press, pp 85–94Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Theodora Tsikrika
    • 1
  • Christos Diou
    • 2
    • 3
  • Arjen P. de Vries
    • 1
    • 4
  • Anastasios Delopoulos
    • 2
    • 3
  1. 1.Centrum Wiskunde & InformaticaAmsterdamThe Netherlands
  2. 2.Multimedia Understanding Group, Electrical and Computer Engineering DepartmentAristotle University of ThessalonikiThessalonikiGreece
  3. 3.Informatics and Telematics InstituteCentre for Research and Technology HellasThessalonikiGreece
  4. 4.Delft University of TechnologyDelftThe Netherlands

Personalised recommendations