Skip to main content

Vector Space Model for Texts and the tf-idf Measure

  • Chapter
  • First Online:
Syntactic n-grams in Computational Linguistics

Part of the book series: SpringerBriefs in Computer Science ((BRIEFSCOMPUTER))

Abstract

In this chapter, we discuss the features that are used for text representation while comparing them in vector space model, such as words or n-grams. We also present the possible values of these features: tf, idf, and tf-idf.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Bibliography

  1. Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: Proc. of 5th Int. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (2011)

    Google Scholar 

  2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley (1999)

    Google Scholar 

  3. Jiménez-Salazar, H., Pinto, D., Rosso, P.: Uso del punto de transición en la selección de términos índice para agrupamiento de textos cortos. Procesamiento del Lenguaje Natural, 35, pp. 383–390 (2005)

    Google Scholar 

  4. Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA (1999)

    MATH  Google Scholar 

  5. Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60(3): 538–556 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Sidorov, G. (2019). Vector Space Model for Texts and the tf-idf Measure. In: Syntactic n-grams in Computational Linguistics. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-14771-6_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-14771-6_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-14770-9

  • Online ISBN: 978-3-030-14771-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics