Rough Set Based Concept Extraction Paradigm for Document Ranking

Singh, Shailendra; Ray, Santosh Kumar; Joshi, Bhagwati P.

doi:10.1007/978-3-642-10687-3_18

Rough Set Based Concept Extraction Paradigm for Document Ranking

Shailendra Singh⁶,
Santosh Kumar Ray⁷ &
Bhagwati P. Joshi⁸

Conference paper

431 Accesses
1 Citations
3 Altmetric

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 67))

Abstract

On the World Wide Web, Open domain Question Answering System is one of the emerging information retrieval systems which are becoming popular day by day to get succinct relevant answers in response of users’ questions. In this paper, we are addressing rough set based method for document ranking which is one of the major tasks in the representation of retrieved results and directly contributes towards accuracy of a retrieval system. Rough sets are widely used for document categorization, vocabulary reduction, and other information retrieval problems. We are proposing a computationally efficient rough set based method for ranking of the documents. The distinctive point of the proposed algorithm is to give more emphasis on presence and position of the concept combination instead of term frequencies. We have experimented over a set of standard questions collected from TREC, Wordbook, WorldFactBook using Google and our proposed method.We found 16% improvement in document ranking performance. Further, we have compared our method with online Question Answering System Answer Bus and observed 38% improvement in ranking relevant documents on top ranks. We conducted more experiments to judge the effectiveness of the information retrieval system and found satisfactory performance results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Alpert, J., Hajaj, N.: We Knew the Web was Big (2008), http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html
Bao, Y., Aoyama, S., Yamada, K., Ishii, N., Du, X.: A Rough Set Based Hybrid Method to Text Categorization. In: Second international conference on web information systems engineering (WISE 2001), vol. 1, pp. 254–261. IEEE Computer Society, Washington (2001)
Google Scholar
CIA the World Factbook, https://www.cia.gov/library/publications/the-world-factbook/
Crestani, F., Lalmas, M., Rijsbergen, J., Campbell, L.: Is This Document Relevant? ...Probably. A Survey of Probabilistic Models in Information Retrieval. ACM Computing Surveys 30(4), 528–552 (1998)
Google Scholar
Jensen, R., Shen, Q.: A Rough Set-Aided System for Sorting WWW Bookmarks. In: Zhong, N., Yao, Y., Ohsuga, S., Liu, J. (eds.) WI 2001. LNCS (LNAI), vol. 2198, pp. 95–105. Springer, Heidelberg (2001)
Chapter Google Scholar
Komorowski, J.: Rough Set: A Tutorial, folli.loria.fr/cds/1999/library/pdf/skowron.pdf
Lee, D.L., Chuang, H., Seamons, K.: Document Ranking and the Vector Space Model. IEEE Software 14(2), 67–75 (1997)
Article Google Scholar
Pawlak, Z.: Rough Sets. International Journal of Computer and Information Science 11(5), 341–356 (1982)
Article MATH MathSciNet Google Scholar
Ray, S.K., Singh, S., Joshi, B.P.: Question Answering Systems Performance Evaluation – To Construct an Effective Conceptual Query Based on Ontologies and WordNet. In: Proceedings of the 5th Workshop on Semantic Web Applications and Perspectives, Rome, Italy, December 15-17. CEUR Workshop Proceedings (2008) ISSN 1613-0073
Google Scholar
Rocha, C., Schwabe, D., Poggi de Aragão, M.: A Hybrid Approach for Searching in the Semantic Web. In: 13^th International Conference on World Wide Web, pp. 374–383. ACM, New York (2004)
Chapter Google Scholar
Salton, G., Fox, E.A., Wu, H.: Extended Boolean Information Retrieval. Communications of the ACM 26(11), 1022–1036 (1983)
Article MATH MathSciNet Google Scholar
Singh, S., Dey, L.: A Rough-Fuzzy Document Grading System for Customized Text Information Retrieval. Information Processing and Management: an International Journal 41(2), 195–216 (2005)
Article MATH Google Scholar
Tiun, S., Abdullah, R., Kong, T.E.: Automatic Topic Identification Using Ontology Hierarchy. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 444–453. Springer, Heidelberg (2001)
Chapter Google Scholar
Text Retrieval Conference, http://trec.nist.gov/
The World Book, http://www.worldbook.com/
Vallet, D., Fernández, M., Castells, P.: An Ontology-Based Information Retrieval Model. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 455–470. Springer, Heidelberg (2005)
Google Scholar
Wirken, D.: The Google Goal Of Indexing 100 Billion Web Pages (2006), www.sitepronews.com/archives/2006/sep/20.html
Xu, Y., Wang, B., Li, J.T., Jing, H.: An Extended Document Frequency Metric for Feature Selection in Text Categorization. In: Li, H., Liu, T., Ma, W.-Y., Sakai, T., Wong, K.-F., Zhou, G. (eds.) AIRS 2008. LNCS, vol. 4993, pp. 71–82. Springer, Heidelberg (2008)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Samsung India Software Centre, Noida, India
Shailendra Singh
Birla Institute of Technology, Muscat, Oman
Santosh Kumar Ray
Birla Institute of Technology, Noida, India
Bhagwati P. Joshi

Authors

Shailendra Singh
View author publications
You can also search for this author in PubMed Google Scholar
Santosh Kumar Ray
View author publications
You can also search for this author in PubMed Google Scholar
Bhagwati P. Joshi
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Dept. Computer Science, Technical University Ostrava, Tr. 17. Listopadu 15, 708 33, Ostrava, Czech Republic
Vaclav Snášel
Inst. Computer Science, Technical University of Lódz, ul. Wólczanska 215, 93-005, Lódz, Poland
Piotr S. Szczepaniak
Machine Intelligence Research Labs (MIR), Scientific Network for Innovation & Research Excellence, P.O.Box 2259, 98071-2259, Auburn, WA, USA
Ajith Abraham
PAN Warszawa, Systems Research Instiute, Newelska 6, 01-447, Warszawa, Poland
Janusz Kacprzyk

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, S., Ray, S.K., Joshi, B.P. (2010). Rough Set Based Concept Extraction Paradigm for Document Ranking. In: Snášel, V., Szczepaniak, P.S., Abraham, A., Kacprzyk, J. (eds) Advances in Intelligent Web Mastering - 2. Advances in Intelligent and Soft Computing, vol 67. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10687-3_18

Download citation

DOI: https://doi.org/10.1007/978-3-642-10687-3_18
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10686-6
Online ISBN: 978-3-642-10687-3
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics