Abstract
On the World Wide Web, Open domain Question Answering System is one of the emerging information retrieval systems which are becoming popular day by day to get succinct relevant answers in response of users’ questions. In this paper, we are addressing rough set based method for document ranking which is one of the major tasks in the representation of retrieved results and directly contributes towards accuracy of a retrieval system. Rough sets are widely used for document categorization, vocabulary reduction, and other information retrieval problems. We are proposing a computationally efficient rough set based method for ranking of the documents. The distinctive point of the proposed algorithm is to give more emphasis on presence and position of the concept combination instead of term frequencies. We have experimented over a set of standard questions collected from TREC, Wordbook, WorldFactBook using Google and our proposed method.We found 16% improvement in document ranking performance. Further, we have compared our method with online Question Answering System Answer Bus and observed 38% improvement in ranking relevant documents on top ranks. We conducted more experiments to judge the effectiveness of the information retrieval system and found satisfactory performance results.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Alpert, J., Hajaj, N.: We Knew the Web was Big (2008), http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html
Bao, Y., Aoyama, S., Yamada, K., Ishii, N., Du, X.: A Rough Set Based Hybrid Method to Text Categorization. In: Second international conference on web information systems engineering (WISE 2001), vol. 1, pp. 254–261. IEEE Computer Society, Washington (2001)
CIA the World Factbook, https://www.cia.gov/library/publications/the-world-factbook/
Crestani, F., Lalmas, M., Rijsbergen, J., Campbell, L.: Is This Document Relevant? ...Probably. A Survey of Probabilistic Models in Information Retrieval. ACM Computing Surveys 30(4), 528–552 (1998)
Jensen, R., Shen, Q.: A Rough Set-Aided System for Sorting WWW Bookmarks. In: Zhong, N., Yao, Y., Ohsuga, S., Liu, J. (eds.) WI 2001. LNCS (LNAI), vol. 2198, pp. 95–105. Springer, Heidelberg (2001)
Komorowski, J.: Rough Set: A Tutorial, folli.loria.fr/cds/1999/library/pdf/skowron.pdf
Lee, D.L., Chuang, H., Seamons, K.: Document Ranking and the Vector Space Model. IEEE Software 14(2), 67–75 (1997)
Pawlak, Z.: Rough Sets. International Journal of Computer and Information Science 11(5), 341–356 (1982)
Ray, S.K., Singh, S., Joshi, B.P.: Question Answering Systems Performance Evaluation – To Construct an Effective Conceptual Query Based on Ontologies and WordNet. In: Proceedings of the 5th Workshop on Semantic Web Applications and Perspectives, Rome, Italy, December 15-17. CEUR Workshop Proceedings (2008) ISSN 1613-0073
Rocha, C., Schwabe, D., Poggi de Aragão, M.: A Hybrid Approach for Searching in the Semantic Web. In: 13th International Conference on World Wide Web, pp. 374–383. ACM, New York (2004)
Salton, G., Fox, E.A., Wu, H.: Extended Boolean Information Retrieval. Communications of the ACM 26(11), 1022–1036 (1983)
Singh, S., Dey, L.: A Rough-Fuzzy Document Grading System for Customized Text Information Retrieval. Information Processing and Management: an International Journal 41(2), 195–216 (2005)
Tiun, S., Abdullah, R., Kong, T.E.: Automatic Topic Identification Using Ontology Hierarchy. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 444–453. Springer, Heidelberg (2001)
Text Retrieval Conference, http://trec.nist.gov/
The World Book, http://www.worldbook.com/
Vallet, D., Fernández, M., Castells, P.: An Ontology-Based Information Retrieval Model. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 455–470. Springer, Heidelberg (2005)
Wirken, D.: The Google Goal Of Indexing 100 Billion Web Pages (2006), www.sitepronews.com/archives/2006/sep/20.html
Xu, Y., Wang, B., Li, J.T., Jing, H.: An Extended Document Frequency Metric for Feature Selection in Text Categorization. In: Li, H., Liu, T., Ma, W.-Y., Sakai, T., Wong, K.-F., Zhou, G. (eds.) AIRS 2008. LNCS, vol. 4993, pp. 71–82. Springer, Heidelberg (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Singh, S., Ray, S.K., Joshi, B.P. (2010). Rough Set Based Concept Extraction Paradigm for Document Ranking. In: Snášel, V., Szczepaniak, P.S., Abraham, A., Kacprzyk, J. (eds) Advances in Intelligent Web Mastering - 2. Advances in Intelligent and Soft Computing, vol 67. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10687-3_18
Download citation
DOI: https://doi.org/10.1007/978-3-642-10687-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10686-6
Online ISBN: 978-3-642-10687-3
eBook Packages: EngineeringEngineering (R0)