Sentence Ranking for Document Indexing

  • Saptaditya Maiti
  • Deba P. Mandal
  • Pabitra Mitra
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6744)

Abstract

This article discusses a new document indexing scheme for information retrieval. For a structured (e.g., scientific) document, Pasi et al. proposed varying weights to different sections according to their importance in the document. This concept is extended here to unstructured documents. Each sentence in a document is initially assigned weight (significance in the document) with the help of a summarization technique. Accordingly, the term frequency of a term is decided as the sum of weights of the sentences the term belongs. The method is verified on a real life dataset using leading existing information retrieval models, and its performance has been found to be superior to conventional indexing schemes.

Keywords

information retrieval document indexing sentence ranking relative entropy 

References

  1. 1.
    Manning, C.D., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval. Cambridge University Press, New York (2008)CrossRefMATHGoogle Scholar
  2. 2.
    Bordogna, G., Pasi, G.: Controlling retrieval through a user-adaptive representation of documents. International Journal of Approximate Reasoning 12, 317–339 (1995)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Pasi, G.: Fuzzy Sets in Information Retrieval: State of the Art and Research Trends. In: Bustince, H., Herrera, F., Montero, J. (eds.) Fuzzy Sets and Their Extensions: Representation, Aggregation and Models, pp. 517–535. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  4. 4.
    Das, D., Martins, A.: A Survey on Automatic Text Summarization. Literature Survey for the Language and Statistics II Course at CMU (2007)Google Scholar
  5. 5.
    Kumar, C., Pingali, P., Varma, V.: A light-weight summarizer based on language model with relative entropy. In: Proc. of the ACM symposium on Applied Computing (SAC 2009). ACM, New York (2009)Google Scholar
  6. 6.
    Forum for Information Retrieval Evaluation (FIRE), http://www.isical.ac.in/~fire/
  7. 7.
    Ounis, I., Lioma, C., Macdonald, C., Plachouras, V.: Research Directions in Terrier: a Search Engine for Advanced Retrieval on the Web. Novatica/UPGRADE, Special Issue on Next Generation Web Search 8(1), 49–56 (2007)Google Scholar
  8. 8.
    Robertson, S.E., Walker, S.: Some simple approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proc. of 17th Annual Int. Conf. on Research and Development in Information Retrieval, Dublin, pp. 232–241 (1994)Google Scholar
  9. 9.
    Amati, G.: Probability models for information retrieval based on divergence from randomness. Phd thesis, University of Glasgow (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Saptaditya Maiti
    • 1
  • Deba P. Mandal
    • 1
  • Pabitra Mitra
    • 2
  1. 1.Machine Intelligence UnitIndian Statistical InstituteKolkataIndia
  2. 2.Dept. of Computer Science and EngineeringIndian Institute of TechnologyKharagpurIndia

Personalised recommendations