Abstract
In this chapter, we discuss the features that are used for text representation while comparing them in vector space model, such as words or n-grams. We also present the possible values of these features: tf, idf, and tf-idf.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Bibliography
Argamon, S., Juola, P.: Overview of the international authorship identification competition at PAN-2011. In: Proc. of 5th Int. Workshop on Uncovering Plagiarism, Authorship, and Social Software Misuse (2011)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley (1999)
Jiménez-Salazar, H., Pinto, D., Rosso, P.: Uso del punto de transición en la selección de términos índice para agrupamiento de textos cortos. Procesamiento del Lenguaje Natural, 35, pp. 383–390 (2005)
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge, MA (1999)
Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60(3): 538–556 (2009)
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2019 The Author(s), under exclusive licence to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Sidorov, G. (2019). Vector Space Model for Texts and the tf-idf Measure. In: Syntactic n-grams in Computational Linguistics. SpringerBriefs in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-030-14771-6_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-14771-6_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-14770-9
Online ISBN: 978-3-030-14771-6
eBook Packages: Computer ScienceComputer Science (R0)