Skip to main content

Rough Set Based Concept Extraction Paradigm for Document Ranking

  • Conference paper

Part of the book series: Advances in Intelligent and Soft Computing ((AINSC,volume 67))

Abstract

On the World Wide Web, Open domain Question Answering System is one of the emerging information retrieval systems which are becoming popular day by day to get succinct relevant answers in response of users’ questions. In this paper, we are addressing rough set based method for document ranking which is one of the major tasks in the representation of retrieved results and directly contributes towards accuracy of a retrieval system. Rough sets are widely used for document categorization, vocabulary reduction, and other information retrieval problems. We are proposing a computationally efficient rough set based method for ranking of the documents. The distinctive point of the proposed algorithm is to give more emphasis on presence and position of the concept combination instead of term frequencies. We have experimented over a set of standard questions collected from TREC, Wordbook, WorldFactBook using Google and our proposed method.We found 16% improvement in document ranking performance. Further, we have compared our method with online Question Answering System Answer Bus and observed 38% improvement in ranking relevant documents on top ranks. We conducted more experiments to judge the effectiveness of the information retrieval system and found satisfactory performance results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Alpert, J., Hajaj, N.: We Knew the Web was Big (2008), http://googleblog.blogspot.com/2008/07/we-knew-web-was-big.html

  2. Bao, Y., Aoyama, S., Yamada, K., Ishii, N., Du, X.: A Rough Set Based Hybrid Method to Text Categorization. In: Second international conference on web information systems engineering (WISE 2001), vol. 1, pp. 254–261. IEEE Computer Society, Washington (2001)

    Google Scholar 

  3. CIA the World Factbook, https://www.cia.gov/library/publications/the-world-factbook/

  4. Crestani, F., Lalmas, M., Rijsbergen, J., Campbell, L.: Is This Document Relevant? ...Probably. A Survey of Probabilistic Models in Information Retrieval. ACM Computing Surveys 30(4), 528–552 (1998)

    Google Scholar 

  5. Jensen, R., Shen, Q.: A Rough Set-Aided System for Sorting WWW Bookmarks. In: Zhong, N., Yao, Y., Ohsuga, S., Liu, J. (eds.) WI 2001. LNCS (LNAI), vol. 2198, pp. 95–105. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  6. Komorowski, J.: Rough Set: A Tutorial, folli.loria.fr/cds/1999/library/pdf/skowron.pdf

  7. Lee, D.L., Chuang, H., Seamons, K.: Document Ranking and the Vector Space Model. IEEE Software 14(2), 67–75 (1997)

    Article  Google Scholar 

  8. Pawlak, Z.: Rough Sets. International Journal of Computer and Information Science 11(5), 341–356 (1982)

    Article  MATH  MathSciNet  Google Scholar 

  9. Ray, S.K., Singh, S., Joshi, B.P.: Question Answering Systems Performance Evaluation – To Construct an Effective Conceptual Query Based on Ontologies and WordNet. In: Proceedings of the 5th Workshop on Semantic Web Applications and Perspectives, Rome, Italy, December 15-17. CEUR Workshop Proceedings (2008) ISSN 1613-0073

    Google Scholar 

  10. Rocha, C., Schwabe, D., Poggi de Aragão, M.: A Hybrid Approach for Searching in the Semantic Web. In: 13th International Conference on World Wide Web, pp. 374–383. ACM, New York (2004)

    Chapter  Google Scholar 

  11. Salton, G., Fox, E.A., Wu, H.: Extended Boolean Information Retrieval. Communications of the ACM 26(11), 1022–1036 (1983)

    Article  MATH  MathSciNet  Google Scholar 

  12. Singh, S., Dey, L.: A Rough-Fuzzy Document Grading System for Customized Text Information Retrieval. Information Processing and Management: an International Journal 41(2), 195–216 (2005)

    Article  MATH  Google Scholar 

  13. Tiun, S., Abdullah, R., Kong, T.E.: Automatic Topic Identification Using Ontology Hierarchy. In: Gelbukh, A. (ed.) CICLing 2001. LNCS, vol. 2004, pp. 444–453. Springer, Heidelberg (2001)

    Chapter  Google Scholar 

  14. Text Retrieval Conference, http://trec.nist.gov/

  15. The World Book, http://www.worldbook.com/

  16. Vallet, D., Fernández, M., Castells, P.: An Ontology-Based Information Retrieval Model. In: Gómez-Pérez, A., Euzenat, J. (eds.) ESWC 2005. LNCS, vol. 3532, pp. 455–470. Springer, Heidelberg (2005)

    Google Scholar 

  17. Wirken, D.: The Google Goal Of Indexing 100 Billion Web Pages (2006), www.sitepronews.com/archives/2006/sep/20.html

  18. Xu, Y., Wang, B., Li, J.T., Jing, H.: An Extended Document Frequency Metric for Feature Selection in Text Categorization. In: Li, H., Liu, T., Ma, W.-Y., Sakai, T., Wong, K.-F., Zhou, G. (eds.) AIRS 2008. LNCS, vol. 4993, pp. 71–82. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Singh, S., Ray, S.K., Joshi, B.P. (2010). Rough Set Based Concept Extraction Paradigm for Document Ranking. In: Snášel, V., Szczepaniak, P.S., Abraham, A., Kacprzyk, J. (eds) Advances in Intelligent Web Mastering - 2. Advances in Intelligent and Soft Computing, vol 67. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10687-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-10687-3_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-10686-6

  • Online ISBN: 978-3-642-10687-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics