Term-Document Representation

  • Murugan Anandarajan
  • Chelsey Hill
  • Thomas Nolan
Part of the Advances in Analytics and Data Science book series (AADS, volume 2)


This chapter details the process of converting documents into an analysis-ready term-document representation. Preprocessed text documents are first transformed into an inverted index for demonstrative purposes. Then, the inverted index is manipulated into a term-document or document-term matrix. The chapter concludes with descriptions of different weighting schemas for analysis-ready term-document representation.


Inverted index Term-document matrix Document-term matrix Term frequency Document frequency Term frequency-inverse document frequency Inverse document frequency Weighting Term weighting Document weighting Log frequency 


  1. Berry, M. W., Drmac, Z., & Jessup, E. R. (1999). Matrices, vector spaces, and information retrieval. SIAM Review, 41(2), 335–362.CrossRefGoogle Scholar
  2. Dumais, S. T. (1991). Improving the retrieval of information from external sources. Behavior Research Methods, Instruments, & Computers, 23(2), 229–236.CrossRefGoogle Scholar
  3. Jessup, E. R., & Martin, J. H. (2001). Taking a new look at the latent semantic analysis approach to information retrieval. Computational Information Retrieval, 2001, 121–144.Google Scholar
  4. Manning, C., Raghavan, P., & Schütze, H. (2008). Introduction to information retrieval. Cambridge: Cambridge University Press. Scholar

Further Reading

  1. For more about the term-document representation of text data, see Berry et al. (1999) and Manning et al. (2008).Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Murugan Anandarajan
    • 1
  • Chelsey Hill
    • 2
  • Thomas Nolan
    • 3
  1. 1.LeBow College of BusinessDrexel UniversityPhiladelphiaUSA
  2. 2.Feliciano School of BusinessMontclair State UniversityMontclairUSA
  3. 3.Mercury Data ScienceHoustonUSA

Personalised recommendations