Advertisement

Multimedia Tools and Applications

, Volume 73, Issue 3, pp 1323–1359 | Cite as

A rank aggregation framework for video multimodal geocoding

  • Lin Tzy LiEmail author
  • Daniel Carlos Guimarães Pedronette
  • Jurandy Almeida
  • Otávio A. B. Penatti
  • Rodrigo Tripodi Calumby
  • Ricardo da Silva Torres
Article

Abstract

This paper proposes a rank aggregation framework for video multimodal geocoding. Textual and visual descriptions associated with videos are used to define ranked lists. These ranked lists are later combined, and the resulting ranked list is used to define appropriate locations for videos. An architecture that implements the proposed framework is designed. In this architecture, there are specific modules for each modality (e.g, textual and visual) that can be developed and evolved independently. Another component is a data fusion module responsible for combining seamlessly the ranked lists defined for each modality. We have validated the proposed framework in the context of the MediaEval 2012 Placing Task, whose objective is to automatically assign geographical coordinates to videos. Obtained results show how our multimodal approach improves the geocoding results when compared to methods that rely on a single modality (either textual or visual descriptors). We also show that the proposed multimodal approach yields comparable results to the best submissions to the Placing Task in 2012 using no extra information besides the available development/training data. Another contribution of this work is related to the proposal of a new effectiveness evaluation measure. The proposed measure is based on distance scores that summarize how effective a designed/tested approach is, considering its overall result for a test dataset.

Keywords

Video geotagging Multimodal retrieval Rank aggregation Effectiveness measure 

Notes

Acknowledgements

The authors thank CAPES (Brazilian Federal Agency for Support and Evaluation of Graduate Education), FAPESP (São Paulo Research Foundation) grants 2011/11171-5 and 2009/10554-8, and CNPq (National Council for Scientific and Technological Development) grants 306580/2012-8 and 484254/2012-0, as well as CPqD Foundation (Telecommunications Research and Development Center) for their support. Additionally we would like to thank for the suggestions and questions arisen by the anonymous reviewers that gave us the chance to improve our paper.

References

  1. 1.
    Almeida J, Leite NJ, Torres R da S (2011) Comparison of video sequences with histograms of motion patterns. In: International conference on image processing, pp 3673–3676Google Scholar
  2. 2.
    Andrade FSP, Almeida J, Pedrini H, Torres R da S (2012) Fusion of local and global descriptors for content-based image and video retrieval. In: Iberoamerican congress on pattern recognition (CIARP’S), pp 845–853Google Scholar
  3. 3.
    Boureau YL, Bach F, LeCun Y, Ponce J (2010) Learning mid-level features for recognition. In: Conference on computer vision and pattern recognition, pp 2559–2566. doi: 10.1109/CVPR.2010.5539963
  4. 4.
    Candeias R, Martins B (2011) Associating relevant photos to georeferenced textual documents through rank aggregation. In: Terra Cognita 2011 workshop. In conjunction with 10th international semantic web conferenceGoogle Scholar
  5. 5.
    Choi J, Ekambaram VN, Friedland G, Ramchandran K (2012) The 2012 ICSI/Berkeley video location estimation system. In: Larson MA, Schmiedeke S, Kelm P, Rae A, Mezaris V, Piatrik T, Soleymani M, Metze F, Jones GJF (eds) Working notes proceedings of the MediaEval 2012 workshop, Santa Croce in Fossabanda, Pisa, Italy, 4–5 October, 2012, CEUR Workshop Proceedings, vol. 927. CEUR-WS.orgGoogle Scholar
  6. 6.
    Choi J, Lei H, Friedland G (2011) The 2011 ICSI video location estimation system. In: Working notes proceedings of the MediaEval workshop, vol 807Google Scholar
  7. 7.
    Clinchant S, Ah-Pine J, Csurka G (2011) Semantic combination of textual and visual information in multimedia retrieval. In: International conference on multimedia retrieval, pp 44:1–44:8Google Scholar
  8. 8.
    Coppersmith D, Fleischer LK, Rurda A (2010) Ordering by weighted number of wins gives a good ranking for weighted tournaments. ACM Trans Algorithm 6(3):55:1–55:13CrossRefMathSciNetGoogle Scholar
  9. 9.
    Cormack GV, Clarke CLA, Buettcher S (2009) Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In: ACM SIGIR conference on research and development in information retrieval, pp 758–759Google Scholar
  10. 10.
    Croft WB (2002) Combining approaches to information retrieval. In: Croft WB, Croft WB (eds) Advances in information retrieval, the information retrieval, vol 7. Springer US, pp 1–36Google Scholar
  11. 11.
    Ding D, Zhang B (2007) Probabilistic model supported rank aggregation for the semantic concept detection in video. In: Proceedings of the 6th ACM international Conference on Image and Video Retrieval, CIVR ’07, pp 587–594. doi: 10.1145/1282280.1282364. http://doi.acm.org/10.1145/1282280.1282364
  12. 12.
    Faria FA, Veloso A, de Almeida HM, Valle E, Torres R da S, Gonçalves MA, Jr WM (2010) Learning to rank for content-based image retrieval. In: International conference on multimedia information retrieval, pp 285–294Google Scholar
  13. 13.
    Fishburn PC (1988) Nonlinear preference and utility theory/Peter C. Fishburn. Johns Hopkins University Press, BaltimoreGoogle Scholar
  14. 14.
    Fox EA, Shaw JA (1994) Combination of multiple searches. In: Text REtrieval Conference (TREC-2), vol 500–215, pp 243–252Google Scholar
  15. 15.
    Friendly M (2002) Corrgrams: exploratory displays for correlation matrices. Am Stat 56(4):316–324CrossRefMathSciNetGoogle Scholar
  16. 16.
    Hauff C, Houben GJ (2011) WISTUD at MediaEval 2011: placing task. In: Working notes proceedings of the MediaEval workshop, vol 807Google Scholar
  17. 17.
    Hays J, Efros AA (2008) im2gps: estimating geographic information from a single image. In: Conference on computer vision and pattern recognitionGoogle Scholar
  18. 18.
    Jones CB, Purves RS (2008) Geographical information retrieval. Int J Geogr Inf Sci 22(3):219–228CrossRefGoogle Scholar
  19. 19.
    Kalantidis Y, Tolias G, Avrithis Y, Phinikettos M, Spyrou E, Mylonas P, Kollias S (2011) Viral: visual image retrieval and localization. Multimed Tools Appl 51:555–592CrossRefGoogle Scholar
  20. 20.
    Kelm P, Schmiedeke S, Sikora T (2011) A hierarchical, multi-modal approach for placing videos on the map using millions of flickr photographs. In: Workshop on Social and Behavioural Networked Media Access, SBNMA ’11, pp 15–20Google Scholar
  21. 21.
    Kelm P, Schmiedeke S, Sikora T (2011) Multi-modal, multi-resource methods for placing Flickr videos on the map. In: International conference on multimedia retrievalGoogle Scholar
  22. 22.
    Kelm P, Schmiedeke S, Sikora T (2012) How spatial segmentation improves the multimodal geo-tagging. In: Larson MA, Schmiedeke S, Kelm P, Rae A, Mezaris V, Piatrik T, Soleymani M, Metze F, Jones GJF (eds) Working notes proceedings of the MediaEval 2012 workshop, Santa Croce in Fossabanda, Pisa, Italy, 4–5 October, 2012, CEUR Workshop Proceedings, vol. 927. CEUR-WS.orgGoogle Scholar
  23. 23.
    Kelm P, Schmiedeke S, Sikora T (2012) Multimodal geo-tagging in social media websites using hierarchical spatial segmentation. In: LBSN ’12, pp 32–39. ACM, New York, NY, doi: 10.1145/2442796.2442805. http://doi.acm.org/10.1145/2442796.2442805 Google Scholar
  24. 24.
    Khudyak KA, Kurland O (2011) Cluster-based fusion of retrieved lists. In: Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval, SIGIR ’11, pp 893–902Google Scholar
  25. 25.
    Klementiev A, Roth D, Small K (2008) A framework for unsupervised rank aggregation. In: Proc. of the ACM SIGIR conference (SIGIR) workshop on learning to rank for information retrieval, pp 32–39. http://cogcomp.cs.illinois.edu/papers/KlementievRoSm08a.pdf
  26. 26.
    Kludas J, Bruno E, Marchand-Maillet S (2008) Information fusion in multimedia information retrieval. In: Boujemaa N, Detyniecki M, Nürnberger A (eds) Adaptive multimedial retrieval: retrieval, user, and semantics. Springer, New York, pp 147–159CrossRefGoogle Scholar
  27. 27.
    Kokar MM, Tomasik JA, Weyman J (2004) Formalizing classes of information fusion systems. Inform Fusion 5(3):189–202CrossRefGoogle Scholar
  28. 28.
    Laere OV, Schockaert S, Dhoedt B (2011) Ghent university at the 2011 placing task. In: Working notes proceedings of the MediaEval workshop, vol 807Google Scholar
  29. 29.
    Laere OV, Schockaert S, Quinn JA, Langbein FC, Dhoedt B (2012) Ghent and cardiff university at the 2012 placing task. In: Larson MA, Schmiedeke S, Kelm P, Rae A, Mezaris V, Piatrik T, Soleymani M, Metze F, Jones GJF (eds) Working notes proceedings of the MediaEval 2012 workshop, Santa Croce in Fossabanda, Pisa, Italy, 4–5 October, 2012, CEUR Workshop Proceedings, vol. 927. CEUR-WS.orgGoogle Scholar
  30. 30.
    Larson M, Soleymani M, Serdyukov P, Rudinac S, Wartena C, Murdock V, Friedland G, Ordelman R, Jones GJF (2011) Automatic tagging and geotagging in video collections and communities. In: International conference on multimedia retrieval, pp 51:1–51:8Google Scholar
  31. 31.
    Larson RR (2009) Geographic information retrieval and digital libraries. In: European conference on research and advanced technology for digital libraries, vol 5714, pp 461–464Google Scholar
  32. 32.
    Li LT, Almeida J, Pedronette DCG, Penatti OAB, Torres R da S (2012) A multimodal approach for video geocoding. In: Larson MA, Schmiedeke S, Kelm P, Rae A, Mezaris V, Piatrik T, Soleymani M, Metze F, Jones GJF (eds) Working notes proceedings of the MediaEval 2012 workshop, Santa Croce in Fossabanda, Pisa, Italy, 4–5 October, 2012, CEUR Workshop Proceedings, vol. 927. CEUR-WS.orgGoogle Scholar
  33. 33.
    Li LT, Almeida J, Torres R da S (2011) RECOD working notes for placing task MediaEval 2011. In: Working notes proceedings of the MediaEval workshop, vol 807Google Scholar
  34. 34.
    Li LT, Pedronette DCG, Almeida J, Penatti OAB, Calumby RT, Torres R da S (2012) Multimedia multimodal geocoding. In: ACM SIGSPATIAL international conference on advances in geographic information systems, pp 474–477Google Scholar
  35. 35.
    Li X, Hauff C, Larson M, Hanjalic A (2012) Preliminary exploration of the use of geographical information for content-based geo-tagging of social video. In: Larson MA, Schmiedeke S, Kelm P, Rae A, Mezaris V, Piatrik T, Soleymani M, Metze F, Jones GJF (eds) Working notes proceedings of the MediaEval 2012 workshop, Santa Croce in Fossabanda, Pisa, Italy, 4–5 October, 2012, CEUR Workshop Proceedings, vol. 927. CEUR-WS.orgGoogle Scholar
  36. 36.
    Luo J, Joshi D, Yu J, Gallagher A (2011) Geotagging in multimedia and computer vision–a survey. Multimed Tools Appl 51:187–211CrossRefGoogle Scholar
  37. 37.
    Manning CD, Raghavan P, Schtze H (2008) Introduction to information retrieval. Cambridge University Press, New York, NYCrossRefzbMATHGoogle Scholar
  38. 38.
    Montague M, Aslam JA (2002) Condorcet fusion for improved retrieval. In: Proceedings of the 11th international Conference on Information and Knowledge Management, CIKM ’02, pp 538–548. doi: 10.1145/584792.584881. http://doi.acm.org/10.1145/584792.584881
  39. 39.
    Olligschlaeger AM, Hauptmann AG (1999) Multimodal information systems and GIS: the informedia digital video library. In: 1999 ESRI user conference. http://www.informedia.cs.cmu.edu/documents/ESRI99.html
  40. 40.
    Pedronette DCG (2012) Exploiting contextual information for image re-ranking and rank aggregation in image retrieval tasks. Ph.D. thesis, University of Campinas (UNICAMP), Campinas, SP, BrazilGoogle Scholar
  41. 41.
    Pedronette DCG, Torres R da S (2011) Exploiting clustering approaches for image re-ranking. J Vis Lang Comput 22(6):453–466CrossRefGoogle Scholar
  42. 42.
    Pedronette DCG, Torres R da S, Calumby RT (2012) Using contextual spaces for image re-ranking and rank aggregation. Multimed Tools Appl :1–28. doi: 10.1007/s11042-012-1115-z
  43. 43.
    Penatti OAB, Li LT, Almeida J, Torres R da S (2012) A visual approach for video geocoding using bag-of-scenes. In: International conference on multimedia retrievalGoogle Scholar
  44. 44.
    Poh N, Bengio S (2005) How do correlation and variance of base-experts affect fusion in biometric authentication tasks? IEEE Trans Signal Proces 53(11):4384–4396CrossRefMathSciNetGoogle Scholar
  45. 45.
    Popescu A, Ballas N (2012) CEA LIST’s participation at mediaeval 2012 placing task. In: Larson MA, Schmiedeke S, Kelm P, Rae A, Mezaris V, Piatrik T, Soleymani M, Metze F, Jones GJF (eds) Working notes proceedings of the MediaEval 2012 workshop, Santa Croce in Fossabanda, Pisa, Italy, 4–5 October, 2012, CEUR Workshop Proceedings, vol. 927. CEUR-WS.orgGoogle Scholar
  46. 46.
    Rae A, Kelm P (2012) Working notes for the placing task at mediaeval 2012. In: Larson MA, Schmiedeke S, Kelm P, Rae A, Mezaris V, Piatrik T, Soleymani M, Metze F, Jones GJF (eds) Working notes proceedings of the MediaEval 2012 workshop, Santa Croce in Fossabanda, Pisa, Italy, 4–5 October, 2012, CEUR Workshop Proceedings, vol. 927. CEUR-WS.orgGoogle Scholar
  47. 47.
    Schalekamp F, Zuylen A (1998) Rank aggregation: together were strong. In: Workshop on Algorithm Engineering and Experiments (ALENEX), pp 38–51Google Scholar
  48. 48.
    Sculley D (2007) Rank aggregation for similar items. In: SIAM international conference on Data Mining (SDM 2007), pp 587–592Google Scholar
  49. 49.
    Serdyukov P, Murdock V, van Zwol R (2009) Placing flickr photos on a map. In: ACM SIGIR, pp 484–491. doi: 10.1145/1571941.1572025
  50. 50.
    Trevisiol M, Delhumeau J, Jégou H, Gravier G (2012) How INRIA/IRISA identifies geographic location of a video. In: Larson MA, Schmiedeke S, Kelm P, Rae A, Mezaris V, Piatrik T, Soleymani M, Metze F, Jones GJF (eds) Working notes proceedings of the MediaEval 2012 workshop, Santa Croce in Fossabanda, Pisa, Italy, 4–5 October, 2012, CEUR Workshop Proceedings, vol. 927. CEUR-WS.orgGoogle Scholar
  51. 51.
    Trevisiol M, Jégou H, Delhumeau J, Gravier G (2013) Retrieving geo-location of videos with a divide & conquer hierarchical multimodal approach. In: International conference on multimedia retrievalGoogle Scholar
  52. 52.
    van Gemert JC, Veenman CJ, Smeulders AWM, Geusebroek JM (2010) Visual word ambiguity. IEEE Trans Pattern Anal Mach Intell 32:1271–1283CrossRefGoogle Scholar
  53. 53.
    Van Laere O, Schockaert S, Dhoedt B (2011) Finding locations of flickr resources using language models and similarity search. In: International conference on multimedia retrieval, pp 48:1–48:8. doi: 10.1145/1991996.1992044
  54. 54.
    Young HP (1974) An axiomatization of borda’s rule. J Econ Theory 9(1):43–52CrossRefGoogle Scholar
  55. 55.
    Zhang H, Jiang L, Su J (2005) Augmenting naive bayes for ranking. In: International conference on machine learning, pp 1020–1027Google Scholar
  56. 56.
    Zhou X, Depeursinge A, Müller H (2010) Information fusion for combining visual and textual image retrieval in imageclef@icpr. In: Proceedings of the 20th International Conference on Recognizing Patterns in signals, speech, images, and videos, ICPR ’10. Springer-Verlag, Berlin, Heidelberg, pp 129–137. http://portal.acm.org/citation.cfm?id=1939170.1939189 CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Lin Tzy Li
    • 1
    • 2
    Email author
  • Daniel Carlos Guimarães Pedronette
    • 1
    • 3
  • Jurandy Almeida
    • 1
  • Otávio A. B. Penatti
    • 1
  • Rodrigo Tripodi Calumby
    • 1
    • 4
  • Ricardo da Silva Torres
    • 1
  1. 1.RECOD Lab, Institute of ComputingUniversity of Campinas (UNICAMP)CampinasBrazil
  2. 2.Telecommunications Res. & Dev. CenterCPqD FoundationCampinasBrazil
  3. 3.Department of Statistics, Applied Mathematics and ComputingUniversidade Estadual Paulista (UNESP)Rio ClaroBrazil
  4. 4.Department of Exact SciencesUniversity of Feira de Santana (UEFS)Feira de SantanaBrazil

Personalised recommendations