Abstract
This chapter moves readers from the analysis of one or two texts to a larger corpus. Machine clustering is introduced in the context of an authorship attribution problem, and we reuse some functions developed in previous chapters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Enter ?regex at the prompt to learn more about regex in R.
- 2.
For a brief overview of how this work is conducted, see Jockers (2013), pp. 63–67.
- 3.
Using this formula, the file names become the first column in the resulting matrix and the tokens become the column headers. If, instead, we wanted to “transpose the matrix” and have the file names as the column headers, we would just change the formula to read token + file instead of file + token.
- 4.
All these books are by Irish or Irish-American authors. They were digitized and encoded into TEI by Matthew Jockers as part of his work on the Irish-American West project at Stanford back in the early 2000s.
References
Jockers ML (2013) Macroanalysis: Digital Methods and Literary History, 1st edn. University of Illinois Press, Urbana
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
L. Jockers, M., Thalken, R. (2020). Clustering. In: Text Analysis with R. Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-39643-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-030-39643-5_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39642-8
Online ISBN: 978-3-030-39643-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)