Abstract
This paper describes the successful adaptation of our methodology for the dynamic retrieval of XML elements to a semi-structured environment. Working with text that contains both tagged and untagged elements presents particular challenges in this context. Our system is based on the Vector Space Model; basic functions are performed using the Smart experimental retrieval system. Dynamic element retrieval requires only a single indexing of the document collection at the level of the basic indexing node (i.e., the paragraph). It returns a rank-ordered list of elements identical to that produced by the same query against an all-element index of the collection. Experimental results are reported for both the 2006 and 2007 Ad-hoc tasks.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Crouch, C.: Dynamic element retrieval in a structured environment. ACM Transactions on Information Systems 24(4), 437–454 (2006)
Crouch, C., Crouch, D., Ganapathibhotla, M., Bakshi, V.: Dynamic element retrieval in a semi-structured collection. In: Fuhr, N., Lalmas, M., Trotman, A. (eds.) INEX 2006. LNCS, vol. 4518, pp. 82–88. Springer, Heidelberg (2007)
Crouch, C., Khanna, S., Potnis, P., Daddapaneni, N.: The dynamic retrieval of XML elements. In: Fuhr, N., Lalmas, M., Malik, S., Kazai, G. (eds.) INEX 2005. LNCS, vol. 3977, pp. 268–281. Springer, Heidelberg (2006)
Ganapathibhotla, M.: Query processing in a flexible retrieval environment. M.S. Thesis, Department of Computer Science, University of Minnesota Duluth, Duluth, MN (2006), http://www.d.umn.edu/cs/thesis/Ganapathibhotla.pdf
Fox, E.A.: Extending the Boolean and vector space models of information retrieval with p-norm queries and multiple concept types. Ph.D. Dissertation, Department of Computer Science, Cornell University (1983)
Kamat, N.: Impact of untagged text in dynamic element retrieval. M.S. Thesis, Department of Computer Science, University of Minnesota Duluth, Duluth, MN (2007), http://www.d.umn.edu/cs/thesis/kamat.pdf
Khanna, S.: Design and implementation of a flexible retrieval system. M. S. Thesis, Department of Computer Science, University of Minnesota Duluth, Duluth, MN (2005), http://www.d.umn.edu/cs/thesis/khanna.pdf
Malik, V.: Impact of terminal node processing on element retrieval. M.S. Thesis, Department of Computer Science, University of Minnesota Duluth, Duluth, MN (2007), http://www.d.umn.edu/cs/thesis/malik.pdf
Mone, A.: Dynamic element retrieval for semi-structured documents. M.S. Thesis, Department of Computer Science, University of Minnesota Duluth, Duluth, MN (2007), http://www.d.umn.edu/cs/thesis/mone.pdf
Salton, G. (ed.): The Smart Retrieval System—Experiments in Automatic Document Processing. Prentice-Hall, Englewood Cliffs (1971)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Comm. ACM 18(11), 613–620 (1975)
Singhal, A.: AT&T at TREC-6. In: The Sixth Text REtrieval Conf (TREC-6), pp. 215–225 (1998)
Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: Proc. of the 19th Annual International ACM SIGIR Conference, pp. 21–29 (1996)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2008 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Crouch, C.J., Crouch, D.B., Kamat, N., Malik, V., Mone, A. (2008). Dynamic Element Retrieval in the Wikipedia Collection. In: Fuhr, N., Kamps, J., Lalmas, M., Trotman, A. (eds) Focused Access to XML Documents. INEX 2007. Lecture Notes in Computer Science, vol 4862. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85902-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-540-85902-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-85901-7
Online ISBN: 978-3-540-85902-4
eBook Packages: Computer ScienceComputer Science (R0)