Skip to main content

Enhancing Computer Vision Using the Collective Intelligence of Social Media

  • Chapter
New Directions in Web Data Management 1

Part of the book series: Studies in Computational Intelligence ((SCI,volume 331))

Abstract

Teaching the machine has been a great challenge for computer vision scientists since the very first steps of artificial intelligence. Throughout the decades there have been remarkable achievements that drastically enhanced the capabilities of the machines both from the perspective of infrastructure (i.e., computer networks, processing power, storage capabilities), as well as from the perspective of processing and understanding of the data. Nevertheless, computer vision scientists are still confronted with the problem of designing techniques and frameworks that will be able to facilitate effortless learning and allow analysis methods to easily scale in many different domains and disciplines. It is true that state of the art approaches cannot produce highly effective models, unless there is dedicated, and thus costly, human supervision in the process of learning that dictates the relation between the content and its meaning (i.e., annotation). Recently, we have been witnessing the rapid growth of Social Media that emerged as the result of users’ willingness to communicate, socialize, collaborate and share content. The outcome of this massive activity was the generation of a tremendous volume of user contributed data that have been made available on the Web, usually along with an indication of their meaning (i.e., tags). This has motivated the research objective of investigating whether the Collective Intelligence that emerges from the users’ contributions inside a Web 2.0 application, can be used to remove the need for dedicated human supervision during the process of learning. In this chapter we deal with a very demanding learning problem in computer vision that consists of detecting and localizing an object within the image content. We present a method that exploits the Collective Intelligence that is fostered inside an image Social Tagging System in order to facilitate the automatic generation of training data and therefore object detection models. The experimental results shows that although there are still many issues to be addressed, computer vision technology can definitely benefit from Social Media.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. MPEG-7 Visual Experimentation Model (XM). Version 10.0, ISO/IEC/JTC1/SC29/WG11, Doc. N4062 (2001)

    Google Scholar 

  2. Aurnhammer, M., Hanappe, P., Steels, L.: Augmenting navigation for collaborative tagging with emergent semantics. In: International Semantic Web Conference (2006)

    Google Scholar 

  3. Barnard, K., Duygulu, P., Forsyth, D.A., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)

    Article  MATH  Google Scholar 

  4. Begelman, G.: Automated tag clustering: Improving search and exploration in the tag space. In: Proc. of the Collaborative Web Tagging Workshop at WWW 2006 (2006)

    Google Scholar 

  5. Bennett, K.P., Demiriz, A., Maclin, R.: Exploiting unlabeled data in ensemble methods. In: KDD 2002: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 289–296. ACM, New York (2002), http://doi.acm.org/10.1145/775047.775090

    Chapter  Google Scholar 

  6. Bezdek, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms. Kluwer Academic Publishers, Norwell (1981)

    MATH  Google Scholar 

  7. Biederman, I.: Recognition-by-components: A theory of human image understanding. Psychological Review 94, 115–147 (1987)

    Article  Google Scholar 

  8. Breiman, L., Friedman, J., Olshen, R., Stone, C.: Classification and Regression Trees. Wadsworth and Brooks, Monterey, CA (1984)

    MATH  Google Scholar 

  9. d’Alché-Buc, F., Grandvalet, Y., Ambroise, C.: Semi-supervised marginboost. In: NIPS, pp. 553–560 (2001)

    Google Scholar 

  10. Cao, L., Luo, J., Huang, T.S.: Annotating photo collections by label propagation according to multiple similarity cues. In: MM 2008: Proceeding of the 16th ACM international conference on Multimedia, pp. 121–130. ACM, New York (2008), http://doi.acm.org/10.1145/1459359.1459376

    Chapter  Google Scholar 

  11. Carneiro, G., Chan, A.B., Moreno, P.J., Vasconcelos, N.: Supervised learning of semantic classes for image annotation and retrieval. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 394–410 (2007)

    Article  Google Scholar 

  12. Carson, C., Belongie, S., Greenspan, H., Malik, J.: Blobworld: Image segmentation using expectation-maximization and its application to image querying. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 1026–1038 (1999)

    Article  Google Scholar 

  13. Comaniciu, D., Meer, P.: Mean shift: a robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24(5), 603–619 (2002), doi:10.1109/34.1000236

    Article  Google Scholar 

  14. Conrady, R.: Travel technology in the era of Web 2.0. Trends and Issues in Global Tourism 2007. Springer, Heidelberg (2007)

    Google Scholar 

  15. Cour, T., Sapp, B., Jordan, C., Taskar, B.: Learning from ambiguously labeled images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 919–926 (2009), http://doi.ieeecomputersociety.org/10.1109/CVPRW.2009.5206667

  16. Cour, T., Sapp, B., Jordan, C., Taskar, B.: Learning from ambiguously labeled images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009) (2009)

    Google Scholar 

  17. Domingos, P., Pazzani, M.J.: On the optimality of the simple bayesian classifier under zero-one loss. Machine Learning 29(2-3), 103–130 (1997), citeseer.ist.psu.edu/domingos97optimality.html

    Article  MATH  Google Scholar 

  18. Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.A.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  19. Egmont-Petersen, M., de Ridder, D., Handels, H.: Image processing with neural networks–a review. Pattern Recognition 35(10), 2279–2301 (2002), doi:10.1016/S0031-3203(01)00178-9

    Article  MATH  Google Scholar 

  20. Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and effective querying by image content. J. Intell. Inf. Syst. 3(3-4), 231–262 (1994), http://dx.doi.org/10.1007/BF00962238

    Article  Google Scholar 

  21. Fergus, R., Li, F.F., Perona, P., Zisserman, A.: Learning object categories from google’s image search. In: ICCV, pp. 1816–1823 (2005)

    Google Scholar 

  22. Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1), 119–139 (1997), http://dx.doi.org/10.1006/jcss.1997.1504

    Article  MATH  MathSciNet  Google Scholar 

  23. Frey, B.J., Dueck, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007), www.psi.toronto.edu/affinitypropagation

    Article  MathSciNet  Google Scholar 

  24. Ghosh, H., Poornachander, P., Mallik, A., Chaudhury, S.: Learning ontology for personalized video retrieval. In: MS 2007: Workshop on multimedia information retrieval on The many faces of multimedia semantics, pp. 39–46. ACM, New York (2007), http://doi.acm.org/10.1145/1290067.1290075

    Chapter  Google Scholar 

  25. Giannakidou, E., Kompatsiaris, I., Vakali, A.: Semsoc: Semantic, social and content-based clustering in multimedia collaborative tagging systems. In: ICSC, pp. 128–135 (2008)

    Google Scholar 

  26. Giannakidou, E., Koutsonikola, V.A., Vakali, A., Kompatsiaris, Y.: Co-clustering tags and social data sources. In: WAIM, pp. 317–324 (2008)

    Google Scholar 

  27. Golder, S.A., Huberman, B.A.: The structure of collaborative tagging systems. CoRR abs/cs/0508082 (2005)

    Google Scholar 

  28. Grahl, M., Hotho, A., Stumme, G.: Conceptual clustering of social bookmarking sites. In: 7th International Conference on Knowledge Management (I-KNOW 2007), Know-Center, Graz, Austria, pp. 356–364 (2007)

    Google Scholar 

  29. Gruber, T.: Ontology of folksonomy: A mash-up of apples and oranges (2005), http://tomgruber.org/writing/ontology-of-folksonomy.htm

  30. Jaschke, R., Hotho, A., Schmitz, C., Ganter, B., Stumme, G.: Trias–an algorithm for mining iceberg tri-lattices. In: ICDM 2006: Proceedings of the Sixth International Conference on Data Mining, pp. 907–911. IEEE Computer Society, Washington (2006), http://dx.doi.org/10.1109/ICDM.2006.162

  31. Joachims, T.: Making large-scale support vector machine learning practical, pp. 169–184 (1999)

    Google Scholar 

  32. Johnson, S.: Hierarchical clustering schemes. Psychometrika 32(3), 241–254 (1967)

    Article  Google Scholar 

  33. Joshi, D., Luo, J.: Inferring generic activities and events from image content and bags of geo-tags. In: CIVR 2008: Proceedings of the 2008 International Conference on Content-based Image and Video Retrieval, pp. 37–46. ACM, New York (2008), http://doi.acm.org/10.1145/1386352.1386361

    Chapter  Google Scholar 

  34. Kennedy, L.S., Chang, S.-F., Kozintsev, I.: To search or to label?: predicting the performance of search-based automatic image classifiers. In: Multimedia Information Retrieval, pp. 249–258 (2006)

    Google Scholar 

  35. Kennedy, L.S., Naaman, M., Ahern, S., Nair, R., Rattenbury, T.: How flickr helps us make sense of the world: context and content in community-contributed media collections. In: ACM Multimedia, pp. 631–640 (2007)

    Google Scholar 

  36. Leibe, B., Leonardis, A., Schiele, B.: An implicit shape model for combined object categorization and segmentation. In: Toward Category-Level Object Recognition, pp. 508–524 (2006)

    Google Scholar 

  37. Leistner, C., Grabner, H., Bischof, H.: Semi-supervised boosting using visual similarity learning. In: CVPR (2008)

    Google Scholar 

  38. Li, F.F., Fergus, R., Perona, P.: One-shot learning of object categories. IEEE Trans. Pattern Anal. Mach. Intell. 28(4), 594–611 (2006)

    Article  Google Scholar 

  39. Li, F.F., Perona, P., Technology, C.I: A bayesian hierarchical model for learning natural scene categories. In: CVPR, vol. 2, pp. 524–531 (2005)

    Google Scholar 

  40. Li, J., Wang, J.Z.: Real-time computerized annotation of pictures. In: MULTIMEDIA 2006: Proceedings of the 14th Annual ACM International Conference on Multimedia, pp. 911–920. ACM, New York (2006), http://doi.acm.org/10.1145/1180639.1180841

    Chapter  Google Scholar 

  41. Li, J., Wang, J.Z.: Real-time computerized annotation of pictures. IEEE Trans. Pattern Anal. Mach. Intell. 30(6), 985–1002 (2008), http://dx.doi.org/10.1109/TPAMI.2007.70847

    Article  Google Scholar 

  42. Li, L.-J., Socher, R., Fei-Fei, L.: Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In: IEEE Conference on Computer Vision and Pattern Recognition (2009)

    Google Scholar 

  43. Li, Y., Shapiro, L.G.: Consistent line clusters for building recognition in cbir. In: ICPR, vol. (3), pp. 952–956 (2002)

    Google Scholar 

  44. Lowe, D.: Object recognition from local scale-invariant features. In: The Proceedings of the Seventh IEEE International Conference on Computer Vision, vol. 2, pp. 1150–1157 (1999), doi:10.1109/ICCV.1999.790410

    Google Scholar 

  45. Lowe, D.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004), http://dx.doi.org/10.1023/B:VISI.0000029664.99615.94

    Article  Google Scholar 

  46. Lukaszyk, S.: A new concept of probability metric and its applications in approximation of scattered data sets. Computational Mechanics 33, 299–304 (2004), http://www.ingentaconnect.com/content/klu/466/2004/00000033/00000004/art00007

    Article  MATH  MathSciNet  Google Scholar 

  47. MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Cam, L.M.L., Neyman, J. (eds.) Proc. of the fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297. University of California Press, Berkeley (1967)

    Google Scholar 

  48. Mallapragada, P.K., Jin, R., Jain, A.K., Liu, Y.: Semiboost: Boosting for semi-supervised learning. IEEE Transactions on Pattern Analysis and Machine Intelligence 31(11), 2000–2014 (2008), doi:10.1109/TPAMI.2008.235

    Article  Google Scholar 

  49. Marlow, C., Naaman, M., Boyd, D., Davis, M.: Ht06, tagging paper, taxonomy, flickr, academic article, to read. In: Hypertext, pp. 31–40 (2006)

    Google Scholar 

  50. Meadow, C.T.: Text Information Retrieval Systems. Academic Press, Inc., Orlando (1992)

    Google Scholar 

  51. Meyer, D., Leisch, F., Hornik, K.: The support vector machine under test. Neurocomputing 55(1-2), 169–186 (2003)doi:10.1016/S0925-2312(03)00431-4, http://www.sciencedirect.com/science/article/B6V10-49CRCBP-1/2/346ddc665b1b67be089a7d5d46edca07

    Article  Google Scholar 

  52. Mezaris, V., Kompatsiaris, I., Strintzis, M.G.: Still image segmentation tools for object-based multimedia applications. IJPRAI 18(4), 701–725 (2004)

    Google Scholar 

  53. Mezaris, V., Kompatsiaris, I., Strintzis, M.G.: Still image segmentation tools for object-based multimedia applications. IJPRAI 18(4), 701–725 (2004)

    Google Scholar 

  54. Mika, P.: Ontologies are us: A unified model of social networks and semantics. Web Semant. 5(1), 5–15 (2007), http://dx.doi.org/10.1016/j.websem.2006.11.002

    MathSciNet  Google Scholar 

  55. O’Really, T.: What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. O’Reilly Media Inc., Sebastopol (2005)

    Google Scholar 

  56. Palen, L., Hiltz, S.R., Liu, S.B.: Online forums supporting grassroots participation in emergency preparedness and response. Commun. ACM 50(3), 54–58 (2007), http://doi.acm.org/10.1145/1226736.1226766

    Article  Google Scholar 

  57. Quack, T., Leibe, B., Gool, L.J.V.: World-scale mining of objects and events from community photo collections. In: CIVR, pp. 47–56 (2008)

    Google Scholar 

  58. Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: CVPR, vol. (2), pp. 1605–1614 (2006)

    Google Scholar 

  59. van de Sande, K., Gevers, T., Snoek, C.: Evaluating color descriptors for object and scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 99(1) (doi:5555), http://doi.ieeecomputersociety.org/10.1109/TPAMI.2009.154

  60. Schmitz, P.: Inducing ontology from flickr tags. In: Proc. of the Collaborative Web Tagging Workshop (WWW 2006) (2006), http://www.rawsugar.com/www2006/22.pdf

  61. Scholkopf, B., Smola, A., Williamson, R., Bartlett, P.: New support vector algorithms. Neural Networks 22, 1083–1121 (2000)

    Google Scholar 

  62. Shi, J., Malik, J.: Normalized cuts and image segmentation. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 0, p. 731 (1997), http://doi.ieeecomputersociety.org/10.1109/CVPR.1997.609407

  63. Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their localization in images. In: ICCV, pp. 370–377 (2005)

    Google Scholar 

  64. Sivic, J., Zisserman, A.: Video google: A text retrieval approach to object matching in videos. In: ICCV 2003: Proceedings of the Ninth IEEE International Conference on Computer Vision, p. 1470. IEEE Computer Society, Washington (2003)

    Chapter  Google Scholar 

  65. Sun, Y., Shimada, S., Taniguchi, Y., Kojima, A.: A novel region-based approach to visual concept modeling using web images. In: ACM Multimedia, 635–638 (2008)

    Google Scholar 

  66. Sung, K.K., Poggio, T.: Example-based learning for view-based human face detection. IEEE Trans. Pattern Anal. Mach. Intell. 20(1), 39–51 (1998)

    Article  Google Scholar 

  67. Torralba, A.B., Murphy, K.P., Freeman, W.T.: Contextual models for object detection using boosted random fields. In: NIPS (2004)

    Google Scholar 

  68. Tsikrika, T., Diou, C., de Vries, A.P., Delopoulos, A.: Image annotation using clickthrough data. In: 8th ACM International Conference on Image and Video Retrieval, Santorini, Greece (2009)

    Google Scholar 

  69. Vasconcelos, M., Vasconcelos, N., Carneiro, G.: Weakly supervised top-down image segmentation. In: CVPR, vol. (1), pp. 1001–1006 (2006)

    Google Scholar 

  70. Viola, P.A., Jones, M.J.: Rapid object detection using a boosted cascade of simple features. In: CVPR, vol. (1), pp. 511–518 (2001)

    Google Scholar 

  71. Wang, Z., Feng, D.D., Chi, Z., Xia, T.: Annotating image regions using spatial context. In: International Symposium on Multimedia, vol. 0, pp. 55–61 (2006), http://doi.ieeecomputersociety.org/10.1109/ISM.2006.32

  72. Wu, L., Hua, X.-S., Yu, N., Ma, W.-Y., Li, S.: Flickr distance. In: ACM Multimedia, 31–40 (2008)

    Google Scholar 

  73. Yanai, K.: Generic image classification using visual knowledge on the web. In: ACM Multimedia, 167–176 (2003)

    Google Scholar 

  74. Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: A comprehensive study. Int. J. Comput. Vision 73(2), 213–238 (2007), http://dx.doi.org/10.1007/s11263-006-9794-4

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Chatzilari, E., Nikolopoulos, S., Patras, I., Kompatsiaris, I. (2011). Enhancing Computer Vision Using the Collective Intelligence of Social Media. In: Vakali, A., Jain, L.C. (eds) New Directions in Web Data Management 1. Studies in Computational Intelligence, vol 331. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-17551-0_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-17551-0_9

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-17550-3

  • Online ISBN: 978-3-642-17551-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics