Abstract
Use of XML offers a structured approach for representing information while maintaining separation of form and content. XML information retrieval is different from standard text retrieval in two aspects: the XML structure may be of interest as part of the query; and the information does not have to be text. In this paper, we describe an investigation of approaches to retrieve text and images from a large collection of XML documents, performed in the course of our participation in the INEX 2006 Ad Hoc and Multimedia tracks. We evaluate three information retrieval similarity measures: Pivoted Cosine, Okapi BM25 and Dirichlet. We show that on the INEX 2006 Ad Hoc queries Okapi BM25 is the most effective among the three similarity measures used for retrieving text only, while Dirichlet is more suitable when retrieving heterogeneous (text and image) data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aslandogan, Y.A., Yu, C.T.: Evaluating strategies and systems for content-based indexing of person images on the web. In: MULTIMEDIA 2000. Proceedings of the Eighth ACM International Conference on Multimedia, pp. 313–321. ACM Press, New York, USA (2000)
Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.): INEX 2005. LNCS, vol. 3977, pp. 28–30. Springer, Heidelberg (2006)
Awang Iskandar, D.N.F., Pehcevski, J., Thom, J.A., Tahaghoghi, S.M.M.: Combining image and structured text retrieval. In: Fuhr, et al. [2], pp. 525–539
Kazai, G., Lalmas, M.: INEX 2005 evaluation measures. In: Fuhr, et al. [2], pp. 16–29 (2005)
Larsen, B., Ingwersen, P., Kekäläinen, J.: The polyrepresentation continuum in IR. In: IIiX: Proceedings of the 1st international conference on Information interaction in context, pp. 88–96. ACM Press, New York (2006)
Pehcevski, J., Thom, J.A., Tahaghoghi, S.M.M.: RMIT University at INEX: Ad Hoc Track. In: Fuhr, et al. [2], pp. 306–320 (2005)
Pehcevski, J., Thom, J.A., Vercoustre, A.-M.: Hybrid XML retrieval: Combining information retrieval and a native XML database. Information Retrieval 8(4), 571–600 (2005)
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proceedings of the ACM-SIGIR International Conference on Research and Development in Information Retrieval, Zurich, Switzerland, pp. 21–29. ACM Press, New York (1996)
Snoek, C.G.M., Worring, M., Gemert, J.C.V., Geusebroek, J., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: MULTIMEDIA 2006. Proceedings of the 14th annual ACM international conference on Multimedia, pp. 421–430. ACM Press, New York (2006)
SparckJones, K., Walker, S., Robertson, S.E.: A probabilistic model of information retrieval: Development and comparative experiments. Parts 1 and 2. Information Processing and Management 36(6), 779–840 (2000)
Squire, D.M., Müller, W., Müller, H., Pun, T.: Content-based query of image databases: Inspirations from text retrieval. Pattern Recognition Letters 21(13–14), 1193–1198 (2000) (special edition for SCIA’99)
Tjondronegoro, D., Zhang, J., Gu, J., Nguyen, A., Geva, S.: Integrating text retrieval and image retrieval in XML document searching. In: Fuhr, et al [2], pp. 511–524
van Zwol, R.: Multimedia strategies for b3 -sdr, based on principal component analysis. In: Fuhr, et al. [2], pp. 540–553
Witten, I.H., Moffat, A., Bell, T.C.: Managing Gigabytes: Compressing and Indexing Documents and Images, 2nd edn. Morgan Kaufmann Publishers, San Francisco (1999)
Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Transactions on Information Systems 22(2), 179–214 (2004)
Zobel, J., Moffat, A.: Exploring the similarity space. ACM SIGIR Forum 32(1), 18–34 (1998)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Awang Iskandar, D.N.F., Pehcevski, J., Thom, J.A., Tahaghoghi, S.M.M. (2007). Social Media Retrieval Using Image Features and Structured Text. In: Fuhr, N., Lalmas, M., Trotman, A. (eds) Comparative Evaluation of XML Information Retrieval Systems. INEX 2006. Lecture Notes in Computer Science, vol 4518. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73888-6_35
Download citation
DOI: https://doi.org/10.1007/978-3-540-73888-6_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73887-9
Online ISBN: 978-3-540-73888-6
eBook Packages: Computer ScienceComputer Science (R0)