Skip to main content

Social Media Retrieval Using Image Features and Structured Text

  • Conference paper
Comparative Evaluation of XML Information Retrieval Systems (INEX 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4518))

Abstract

Use of XML offers a structured approach for representing information while maintaining separation of form and content. XML information retrieval is different from standard text retrieval in two aspects: the XML structure may be of interest as part of the query; and the information does not have to be text. In this paper, we describe an investigation of approaches to retrieve text and images from a large collection of XML documents, performed in the course of our participation in the INEX 2006 Ad Hoc and Multimedia tracks. We evaluate three information retrieval similarity measures: Pivoted Cosine, Okapi BM25 and Dirichlet. We show that on the INEX 2006 Ad Hoc queries Okapi BM25 is the most effective among the three similarity measures used for retrieving text only, while Dirichlet is more suitable when retrieving heterogeneous (text and image) data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aslandogan, Y.A., Yu, C.T.: Evaluating strategies and systems for content-based indexing of person images on the web. In: MULTIMEDIA 2000. Proceedings of the Eighth ACM International Conference on Multimedia, pp. 313–321. ACM Press, New York, USA (2000)

    Chapter  Google Scholar 

  2. Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.): INEX 2005. LNCS, vol. 3977, pp. 28–30. Springer, Heidelberg (2006)

    Google Scholar 

  3. Awang Iskandar, D.N.F., Pehcevski, J., Thom, J.A., Tahaghoghi, S.M.M.: Combining image and structured text retrieval. In: Fuhr, et al. [2], pp. 525–539

    Google Scholar 

  4. Kazai, G., Lalmas, M.: INEX 2005 evaluation measures. In: Fuhr, et al. [2], pp. 16–29 (2005)

    Google Scholar 

  5. Larsen, B., Ingwersen, P., Kekäläinen, J.: The polyrepresentation continuum in IR. In: IIiX: Proceedings of the 1st international conference on Information interaction in context, pp. 88–96. ACM Press, New York (2006)

    Chapter  Google Scholar 

  6. Pehcevski, J., Thom, J.A., Tahaghoghi, S.M.M.: RMIT University at INEX: Ad Hoc Track. In: Fuhr, et al. [2], pp. 306–320 (2005)

    Google Scholar 

  7. Pehcevski, J., Thom, J.A., Vercoustre, A.-M.: Hybrid XML retrieval: Combining information retrieval and a native XML database. Information Retrieval 8(4), 571–600 (2005)

    Article  Google Scholar 

  8. Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 21–29. ACM Press, New York (1996)

    Google Scholar 

  9. Snoek, C.G.M., Worring, M., Gemert, J.C.V., Geusebroek, J., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: MULTIMEDIA 2006. Proceedings of the 14th annual ACM international conference on Multimedia, pp. 421–430. ACM Press, New York (2006)

    Chapter  Google Scholar 

  10. SparckJones, K., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: Development and comparative experiments. Parts 1 and 2. Information Processing and Management 36(6), 779–840 (2000)

    Article  Google Scholar 

  11. Squire, D.M., Müller, W., Müller, H., Pun, T.: Content-based query of image databases: Inspirations from text retrieval. Pattern Recognition Letters 21(13–14), 1193–1198 (2000) (special edition for SCIA’99)

    Article  MATH  Google Scholar 

  12. Tjondronegoro, D., Zhang, J., Gu, J., Nguyen, A., Geva, S.: Integrating text retrieval and image retrieval in XML document searching. In: Fuhr, et al [2], pp. 511–524

    Google Scholar 

  13. van Zwol, R.: Multimedia strategies for b3 -sdr, based on principal component analysis. In: Fuhr, et al. [2], pp. 540–553

    Google Scholar 

  14. Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishers, San Francisco (1999)

    Google Scholar 

  15. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems 22(2), 179–214 (2004)

    Article  Google Scholar 

  16. Zobel, J., Moffat, A.: Exploring the similarity space. ACM SIGIR Forum 32(1), 18–34 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Norbert Fuhr Mounia Lalmas Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Awang Iskandar, D.N.F., Pehcevski, J., Thom, J.A., Tahaghoghi, S.M.M. (2007). Social Media Retrieval Using Image Features and Structured Text. In: Fuhr, N., Lalmas, M., Trotman, A. (eds) Comparative Evaluation of XML Information Retrieval Systems. INEX 2006. Lecture Notes in Computer Science, vol 4518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73888-6_35

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73888-6_35

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73887-9

  • Online ISBN: 978-3-540-73888-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics