Abstract
Semi-structured document retrieval is becoming more popular with the increasing quantity of data available in XML format. In this paper, we describe a search engine model that exploits the structure of the document and uses language modelling and smoothing at the document and collection levels for calculating the relevance of each element from all the documents in the collection to a user query. Element priors, CAS query constraint filtering, and the +/- operators are also used in the ranking procedure. We also present the results of our participation in the INEX 2008 Ad Hoc Track.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Dalbelo Bašić, B., Tadić, M., Moens, M.-F.: Computer Aided Document Indexing for Accessing Legislation, Toegang tot de wet. Die Keure, Brugge, pp. 107–117 (2008)
Denoyer, L., Gallinari, P.: The wikipedia XML corpus. In: ACM SIGIR Forum, vol. 40, pp. 64–69. ACM Press, New York (2006)
Huang, F.: The role of shallow features in XML retrieval. In: INEX 2007 Workshop Proceedings, pp. 33–38 (2007)
Liu, J., Lin, H., Han, B.: Study on reranking XML retrieval elements based on combining strategy and topics categorization. In: INEX 2007 Workshop Proceedings, pp. 170–176 (2007)
Mijić, J., Dalbelo Bašić, B., Šnajder, J.: Building a search engine model with morphological normalization support. In: ITI 2008 Proceedings of the 30th International Conference on Information Technology Interfaces, pp. 619–624 (2008)
Ponte, J.M., Croft, W.B.: A language modeling approach to information retrieval. In: SIGIR 1998: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 275–281. ACM Press, New York (1998)
Porter, M.F.: An algorithm for suffix stripping. Program: electronic library and information systems 40(3), 211–218 (2006)
Šilić, A., Šarić, F., Dalbelo Bašić, B., Šnajder, J.: TMT: Object-oriented text classification library. In: ITI 2007 Proceedings of the 29th International Conference on Information Technology Interfaces, pp. 559–566 (2007)
Šnajder, J., Dalbelo Bašić, B., Tadić, M.: Automatic acquisition of inflectional lexica for morphological normalisation. Information Processing & Management 44(5), 1720–1731 (2008)
Wang, Q., Li, Q., Wang, S.: Preliminary work on XML retrieval. In: INEX 2007 Workshop Proceedings, pp. 70–76 (2007)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mijić, J., Moens, MF., Dalbelo Bašić, B. (2009). CADIAL Search Engine at INEX. In: Geva, S., Kamps, J., Trotman, A. (eds) Advances in Focused Retrieval. INEX 2008. Lecture Notes in Computer Science, vol 5631. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03761-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-642-03761-0_8
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03760-3
Online ISBN: 978-3-642-03761-0
eBook Packages: Computer ScienceComputer Science (R0)