Abstract
In this paper, we introduce variability of syntactic phrases and propose a new retrieval approach reflecting the variability of syntactic phrase representation. With variability measure of a phrase, we can estimate how likely a phrase in a given query would appear in relevant documents and control the impact of syntactic phrases in a retrieval model. Various experimental results over different types of queries and document collections show that our retrieval model based on variability of syntactic phrases is very effective in terms of retrieval performance, especially for long natural language queries.
Similar content being viewed by others
References
Arampatzis, A., van der Weide, T., Koster, C., & van Bommel, P. (2000). Linguistically-motivated information retrieval. In Encyclopedia of Library and Information Science. New York: Marcel Dekker.
Brants, T.: (2004). Natural language processing in information retrieval. In Proceedings of CLIN 2004 (pp. 1–13). Antwerp, Belgium.
Chelba, C., Engle, D., Jelinek, F., Jimenez, V. M., Khudanpur, S., Mangu, L., et al. (1997). Structure and performance of a dependency language model. In Proceedings of Eurospeech ’97 (pp. 2775–2778). Rhodes, Greece.
Chelba, C., & Jelinek, F. (1999). Recognition performance of a structured language model. In Proceedings of Eurospeech ’99 (pp. 1567–1570).
Fagan, J. (1987). Automatic phrase indexing for document retrieval. In Proceedings of SIGIR ’87 (pp. 91–101).
Gao, J., Nie, J.-Y., Wu, G., & Cao, G. (2004). Dependence language model for information retrieval. In Proceedings of SIGIR ’04 (pp. 170–177).
Kraaij, W., & Pohlmann, R. (1998). Comparing the effect of syntactic vs. statistical phrase indexing strategies for dutch. In ECDL ’98: Proceedings of the Second European Conference on Research and Advanced Technology for Digital Libraries (pp. 605–617). London, UK.
Metzler, D., & Croft, W. B. (2005). A Markov random field model for term dependencies. In Proceedings of SIGIR ’05 (pp. 472–479).
Miller, D. R. H., Leek, T., & Schwartz, R. M. (1999). A hidden Markov model information retrieval system. In Proceedings of SIGIR ’99 (pp. 222–229).
Mitra, M., Buckley, C., Singhal, A., & Cardie, C. (1997). An analysis of statistical and syntactic phrases. In Proceedings of RIAO (pp. 200–214).
Pohlmann, R., & Kraaij, W. (1997). The effect of syntactic phrase indexing on retrieval performance for Dutch texts. In Proceedings of RIAO’97 (pp. 176–187).
Porter, M. F. (1997). An algorithm for suffix stripping (pp. 313–316). San Francisco, CA, USA: Morgan Kaufmann.
Song, F., & Croft, W. B. (1999). A general language model for information retrieval. In Proceedings of CIKM ’99 (pp. 316–321).
Srikanth, M., & Srihari, R. (2003). Exploiting syntactic structure of queries in a language modeling approach to IR. In Proceedings of CIKM ’03 (pp. 476–483).
Strzalkowski, T., Carballo, J.P., & Marinescu, M. (1994). Natural language information retrieval: TREC-3 report. In The Third Text REtrieval Conference (TREC 3).
Strzalkowski, T., Guthrie, L., Karlgren, J., Leistensnider, J., Lin, F., Perez-Carballo, J., et al. (1997). Natural Language information retrieval: TREC-5 report. In The Fifth Text REtrieval Conference (TREC 5) (pp. 291–313).
Tapanainen, P., & Jarvinen, T. (1997). A non-projective dependency parser. In Fifth Conference on Applied Natural Language Processing (pp. 64–71).
Zhai, C. (1997). Fast statistical parsing of noun phrases for document indexing. In Proceedings of the fifth conference on Applied natural language processing (pp. 312–319).
Zhai, C., & Lafferty, J. (2001). A study of smoothing methods for language models applied to Ad Hoc information retrieval. In Proceedings of SIGIR-01 (pp. 334–342).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Song, YI., Han, KS., Kim, SB. et al. A novel retrieval approach reflecting variability of syntactic phrase representation. J Intell Inf Syst 31, 265–286 (2008). https://doi.org/10.1007/s10844-007-0045-0
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-007-0045-0