Parameter Tuning in Pivoted Normalization for XML Retrieval: ISI@INEX09 Adhoc Focused Task

Pal, Sukomal; Mitra, Mandar; Ganguly, Debasis

doi:10.1007/978-3-642-14556-8_13

Parameter Tuning in Pivoted Normalization for XML Retrieval: ISI@INEX09 Adhoc Focused Task

Sukomal Pal¹⁹,
Mandar Mitra¹⁹ &
Debasis Ganguly²⁰

Conference paper

550 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6203))

Abstract

This paper describes the work that we did at Indian Statistical Institute towards XML retrieval for INEX 2009. Since there has been an abrupt quantum jump in the INEX corpus size (from 4.6 GB with 659,388 articles to 50.7 GB with 2,666,190 articles), retrieval algorithms and systems were put to a ‘stress test’ in the INEX 2009 campaign. We tuned our text retrieval system (SMART) based on the Vector Space Model (VSM) that we have been using since INEX 2006. We submitted two runs for the adhoc focused task. Both the runs used VSM-based document-level retrieval with blind feedback: an initial run (indsta_VSMpart) used only a small fraction of INEX 2009 corpus; the other used the full corpus (indsta_VSMfb). We considered Content-Only (CO) retrieval, using the Title and Description fields of the INEX 2009 adhoc queries (2009001-2009115). Our official runs, however, used incorrect topic numbers. This led to very dismal performance. Post-submission, the corrected version of both baseline and with-feedback document-level runs achieved competitive scores. We performed a set of experiments to tune our pivoted normalization-based term-weighting scheme for XML retrieval. The scores of our best document-level runs, both with and without blind feedback, seemed to substantially improve after tuning of normalization parameters. We also ran element-level retrieval on a subset of the document-level runs; the new parameter settings seemed to yield competitive results in this case as well. On the evaluation front, we observed an anomaly in the implementation of the evaluation-scripts while interpolated precision is being calculated. We raise the issue since a XML retrievable unit (passage/element) can be partially relevant containing a portion of non-relevant text, unlike document retrieval paradigm where a document is considered either completely relevant or completely non-relevant.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

INEX: Initiative for the Evaluation of XML Retrieval (2009), http://www.inex.otago.ac.nz
W3C: XPath-XML Path Language(XPath) Version 1.0, http://www.w3.org/TR/xpath
Denoyer, L., Gallinari, P.: The Wikipedia XML corpus. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 12–19. Springer, Heidelberg (2006)
Chapter Google Scholar
Schenkel, R., Suchanek, F.M., Kasneci, G.: Yawn: A semantically annotated wikipedia xml corpus. In: BTW, pp. 277–291 (2007)
Google Scholar
Salton, G.: A Blueprint for Automatic Indexing. ACM SIGIR Forum 16(2), 22–38 (Fall 1981)
Article Google Scholar
Buckley, C., Singhal, A., Mitra, M.: Using Query Zoning and Correlation within SMART: TREC5. In: Voorhees, E., Harman, D. (eds.) Proc. Fifth Text Retrieval Conference (TREC-5), NIST Special Publication 500-238 (1997)
Google Scholar
Mitra, M., Singhal, A., Buckley, C.: Improving automatic query expansion. In: SIGIR ’98, Melbourne, Australia, pp. 206–214. ACM, New York (1998)
Chapter Google Scholar
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: SIGIR ’96: Proceedings of the 19th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 21–29. ACM, New York (1996)
Chapter Google Scholar
Pal, S., Mitra, M.: Indian statistical institute at inex 2007 adhoc track: Vsm approach. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds.) INEX 2007. LNCS, vol. 4862, pp. 122–128. Springer, Heidelberg (2008)
Chapter Google Scholar
Singhal, A.: Term Weighting Revisited. PhD thesis, Cornell University (1996)
Google Scholar

Download references

Author information

Authors and Affiliations

Information Retrieval Lab, CVPR Unit, Indian Statistical Institute, Kolkata, India
Sukomal Pal & Mandar Mitra
Synopsys, Bangalore, India
Debasis Ganguly

Authors

Sukomal Pal
View author publications
You can also search for this author in PubMed Google Scholar
Mandar Mitra
View author publications
You can also search for this author in PubMed Google Scholar
Debasis Ganguly
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Science and Technology, Queensland University of Technology, GPO Box 2434, 4001, Brisbane, Qld, Australia
Shlomo Geva
Archives and Information Studies/Humanities, University of Amsterdam, Turfdraagsterpad 9, 1012 XT, Amsterdam, The Netherlands
Jaap Kamps
Department of Computer Science, University of Otago, P.O. Box 56,, 9054, Dunedin, New Zealand
Andrew Trotman

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pal, S., Mitra, M., Ganguly, D. (2010). Parameter Tuning in Pivoted Normalization for XML Retrieval: ISI@INEX09 Adhoc Focused Task. In: Geva, S., Kamps, J., Trotman, A. (eds) Focused Retrieval and Evaluation. INEX 2009. Lecture Notes in Computer Science, vol 6203. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14556-8_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-14556-8_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-14555-1
Online ISBN: 978-3-642-14556-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics