Constructing Thesaurus Using TAG Term Weight for Query Expansion in Information Retrieval Application

Shaila, S. G.; Vadivel, A.

doi:10.1007/978-981-13-2559-5_3

Constructing Thesaurus Using TAG Term Weight for Query Expansion in Information Retrieval Application

S. G. Shaila³ &
A. Vadivel⁴

Chapter
First Online: 30 September 2018

395 Accesses

Abstract

In information retrieval applications, the query expansion is considered as the important procedure for improving the precision of retrieval. This chapter discusses on Thesaurus of N-gram content. This is generated using the content from web documents for expanding the query. The TAG of HTML pages are parsed, and the text present within the TAG is assigned weight based on the nature of TAGs. The total weight for these texts is calculated as the sum of TAG weight and frequency of occurrence. The content of Thesaurus is updated with single term or text as Unigram. Similarly, N-gram Thesaurus is updated with N-term or text along with total weight. Given a query, the term(s) are looked up in the corresponding Thesaurus to obtain a set of query as prediction. The set is ordered based on the total weight, and the user selects any of the term(s) as preference. The benchmark datasets such as Clueweb09B, WT10g and GOV2 are used for experiments. A threshold value is fixed as baseline. The proposed approach has gained 8, 19 and 30% on Clueweb09B, WT10g and GOV2, respectively. In addition, KLDCo and BoCo are used as benchmark datasets for evaluating the performance of the presented approach in terms of query refinement. The MAP, MRR is on the higher side against the baseline.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Hardcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Arnaud, L. H., & Elena, L. (2003) (IBM) Discover key features of DOM level 3 core, part 1, manipulating and comparing nodes, handling text and user data.
Google Scholar
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993.
MATH Google Scholar
Cover, T. M., & Thomas, J. A. (1991). Elements of information theory. New York, NY, USA: Wiley-Interscience.
Book Google Scholar
Cucerzan, S., & Brill, E. (2004). Spelling correction as an iterative process that exploits the collective knowledge of web users. In Proceedings of EMNLP (p. 293).
Google Scholar
Francesco, C., Massimo, D. S., Luca, G., & Paolo, N. (2013). A query expansion method based on a weighted word pairs approach. In Proceedings of 4th IIR Workshop 2013. Pisa, Italy: National Council of Research Campus.
Google Scholar
Gong, Z., Cheang, C., & Hou, U. L. (2005). Web query expansion by wordnet. In Proceedings of Database and Expert Systems Applications (pp. 166–175). Berlin/Heidelberg: LNCS, Springer.
Google Scholar
Griffiths, T. L., Steyvers, M., & Tenenbaum, J. B. (2007). Topics in semantic representation. Psychological Review, 114(2), 211.
Article Google Scholar
Jing, Y., & Croft, W. B. (1994). An association Thesaurus for information retrieval. In Proceedings of RIAO 94 Conference (pp. 146–160).
Google Scholar
Kaptein, R., & Kamps, J. (2009). Advances in focused retrieval. Chapter finding entities in Wikipedia using links and categories (pp. 273–279). Berlin, Heidelberg: Springer-Verlag.
Google Scholar
Kilgarriff, A. (2007). Googlelology is bad science. Journal of Computational Linguistics, 33(1), 147.
Article Google Scholar
Li, Y., Luk, W. P. R., Ho, K. S. E., & Chung, F. L. K. (2007). Improving weak ad-hoc queries using Wikipedia as external corpus. In Proceedings of 30th ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR ‘07 (pp. 797–798). New York, USA.
Google Scholar
Lin, H. C., Wang, L. H., & Chen, S. M. (2005). A new query expansion method for document retrieval by mining additional query terms. In Proceedings of International Conference on Business and Information. Hong Kong, China.
Google Scholar
Macdonald, C., He, B., Plachouras, V., & Ounis, I. (2005). University of Glasgow at TREC 2005: Experiments in terabyte and enterprise tracks with terrier. In Proceedings of 14th Text REtrieval Conference (TREC 2005).
Google Scholar
Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to information retrieval. Cambridge University Press.
Google Scholar
Marianne, H., Nadja, N., & Carolin, B. (Eds.). (2007). Corpus linguistics and the web: In literary and linguistic computing (p. 305). Amsterdam/New York: Radopi.
Google Scholar
Martin-Bautista, M. J., Sanches, D., Chamorro-Martinez, J., Serrano, J. M., & Vila, M. A. (2004). Mining web documents to find additional query terms using fuzzy association rules. Fuzzy Sets and Systems, 148(1), 85.
Article MathSciNet Google Scholar
Metzler, D., & Croft, W. B. (2007). Latent concept expansion using markov random fields. In Proceedings of 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (p. 311).
Google Scholar
Milne, D. N., Witten, I. H., & Nichols, D. M. (2007). A knowledge based search engine powered by Wikipedia. In Proceedings of 16th ACM Conference on Information and Knowledge Management. CIKM ‘07 (pp. 445–454). New York, USA.
Google Scholar
Perez-Aguera, J. R., & Lourdes-Araujo. (2008). Comparing and combining methods for automatic query expansion. Advances in Natural Language Processing Research in Computing Science, 33, 177–188.
Google Scholar
Qiu, Y., & Frei, H.-P. (1993). Concept based query expansion. In Proceedings of 16th ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 160–169). New York, NY, USA.
Google Scholar
Smeaton, A. F., Kelledy, F., & O’Donnell, R. (1995). Thresholding posting lists, query expansions with wordnet and pos tagging of Spanish. In Proceedings of 4th Text REtrieval Conference (TREC-4) (pp. 373–390).
Google Scholar
Van Rijsbergen, C. J. (1977). A theoretical basis for the use of cooccurrence data in information retrieval. Journal of Documentation, 33, 106–119.
Article Google Scholar
Voorhees, E. M. (1994). Query expansion using lexical-semantic relations. In Proceedings of 17th ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 61–69). New York, USA: Springer-Verlag Inc.
Chapter Google Scholar
Wang, C., Yajun, D., Zhang, P., & Han, B. (2010). A term-reweighting method for query expansion. Journal of Computational Information Systems, 6(11), 3779.
Google Scholar
Xu, Y., Jones, G. J., & Wang, B. (2009). Query dependent pseudo-relevance feedback based on Wikipedia. In Proceedings of 32nd InterNational ACM SIGIR Conference on Research and Development in Information Retrieval SIGIR ‘09 (pp. 59–66). New York, USA.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Dayananda Sagar University, Bangalore, India
S. G. Shaila
Department of Computer Science and Engineering, SRM University AP, Amaravati, Andhra Pradesh, India
A. Vadivel

Authors

S. G. Shaila
View author publications
You can also search for this author in PubMed Google Scholar
A. Vadivel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. G. Shaila .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shaila, S.G., Vadivel, A. (2018). Constructing Thesaurus Using TAG Term Weight for Query Expansion in Information Retrieval Application. In: Textual and Visual Information Retrieval using Query Refinement and Pattern Analysis. Springer, Singapore. https://doi.org/10.1007/978-981-13-2559-5_3

Download citation

DOI: https://doi.org/10.1007/978-981-13-2559-5_3
Published: 30 September 2018
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-2558-8
Online ISBN: 978-981-13-2559-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics