Abstract
In information retrieval applications, the query expansion is considered as the important procedure for improving the precision of retrieval. This chapter discusses on Thesaurus of N-gram content. This is generated using the content from web documents for expanding the query. The TAG of HTML pages are parsed, and the text present within the TAG is assigned weight based on the nature of TAGs. The total weight for these texts is calculated as the sum of TAG weight and frequency of occurrence. The content of Thesaurus is updated with single term or text as Unigram. Similarly, N-gram Thesaurus is updated with N-term or text along with total weight. Given a query, the term(s) are looked up in the corresponding Thesaurus to obtain a set of query as prediction. The set is ordered based on the total weight, and the user selects any of the term(s) as preference. The benchmark datasets such as Clueweb09B, WT10g and GOV2 are used for experiments. A threshold value is fixed as baseline. The proposed approach has gained 8, 19 and 30% on Clueweb09B, WT10g and GOV2, respectively. In addition, KLDCo and BoCo are used as benchmark datasets for evaluating the performance of the presented approach in terms of query refinement. The MAP, MRR is on the higher side against the baseline.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Arnaud, L. H., & Elena, L. (2003) (IBM) Discover key features of DOM level 3 core, part 1, manipulating and comparing nodes, handling text and user data.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993.
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York, NY, USA: Wiley-Interscience.
Cucerzan, S., & Brill, E. (2004). Spelling correction as an iterative process that exploits the collective knowledge of web users. In Proceedings of EMNLP (p. 293).
Francesco, C., Massimo, D. S., Luca, G., & Paolo, N. (2013). A query expansion method based on a weighted word pairs approach. In Proceedings of 4th IIR Workshop 2013. Pisa, Italy: National Council of Research Campus.
Gong, Z., Cheang, C., & Hou, U. L. (2005). Web query expansion by wordnet. In Proceedings of Database and Expert Systems Applications (pp. 166–175). Berlin/Heidelberg: LNCS, Springer.
Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114(2), 211.
Jing, Y., & Croft, W. B. (1994). An association Thesaurus for information retrieval. In Proceedings of RIAO 94 Conference (pp. 146–160).
Kaptein, R., & Kamps, J. (2009). Advances in focused retrieval. Chapter finding entities in Wikipedia using links and categories (pp. 273–279). Berlin, Heidelberg: Springer-Verlag.
Kilgarriff, A. (2007). Googlelology is bad science. Journal of Computational Linguistics, 33(1), 147.
Li, Y., Luk, W. P. R., Ho, K. S. E., & Chung, F. L. K. (2007). Improving weak ad-hoc queries using Wikipedia as external corpus. In Proceedings of 30th ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR ‘07 (pp. 797–798). New York, USA.
Lin, H. C., Wang, L. H., & Chen, S. M. (2005). A new query expansion method for document retrieval by mining additional query terms. In Proceedings of International Conference on Business and Information. Hong Kong, China.
Macdonald, C., He, B., Plachouras, V., & Ounis, I. (2005). University of Glasgow at TREC 2005: Experiments in terabyte and enterprise tracks with terrier. In Proceedings of 14th Text REtrieval Conference (TREC 2005).
Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to information retrieval. Cambridge University Press.
Marianne, H., Nadja, N., & Carolin, B. (Eds.). (2007). Corpus linguistics and the web: In literary and linguistic computing (p. 305). Amsterdam/New York: Radopi.
Martin-Bautista, M. J., Sanches, D., Chamorro-Martinez, J., Serrano, J. M., & Vila, M. A. (2004). Mining web documents to find additional query terms using fuzzy association rules. Fuzzy Sets and Systems, 148(1), 85.
Metzler, D., & Croft, W. B. (2007). Latent concept expansion using markov random fields. In Proceedings of 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 311).
Milne, D. N., Witten, I. H., & Nichols, D. M. (2007). A knowledge based search engine powered by Wikipedia. In Proceedings of 16th ACM Conference on Information and Knowledge Management. CIKM ‘07 (pp. 445–454). New York, USA.
Perez-Aguera, J. R., & Lourdes-Araujo. (2008). Comparing and combining methods for automatic query expansion. Advances in Natural Language Processing Research in Computing Science, 33, 177–188.
Qiu, Y., & Frei, H.-P. (1993). Concept based query expansion. In Proceedings of 16th ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 160–169). New York, NY, USA.
Smeaton, A. F., Kelledy, F., & O’Donnell, R. (1995). Thresholding posting lists, query expansions with wordnet and pos tagging of Spanish. In Proceedings of 4th Text REtrieval Conference (TREC-4) (pp. 373–390).
Van Rijsbergen, C. J. (1977). A theoretical basis for the use of cooccurrence data in information retrieval. Journal of Documentation, 33, 106–119.
Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. In Proceedings of 17th ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 61–69). New York, USA: Springer-Verlag Inc.
Wang, C., Yajun, D., Zhang, P., & Han, B. (2010). A term-reweighting method for query expansion. Journal of Computational Information Systems, 6(11), 3779.
Xu, Y., Jones, G. J., & Wang, B. (2009). Query dependent pseudo-relevance feedback based on Wikipedia. In Proceedings of 32nd InterNational ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR ‘09 (pp. 59–66). New York, USA.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Shaila, S.G., Vadivel, A. (2018). Constructing Thesaurus Using TAG Term Weight for Query Expansion in Information Retrieval Application. In: Textual and Visual Information Retrieval using Query Refinement and Pattern Analysis. Springer, Singapore. https://doi.org/10.1007/978-981-13-2559-5_3
Download citation
DOI: https://doi.org/10.1007/978-981-13-2559-5_3
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2558-8
Online ISBN: 978-981-13-2559-5
eBook Packages: Computer ScienceComputer Science (R0)