Clustering

L. Jockers, Matthew; Thalken, Rosamond

doi:10.1007/978-3-030-39643-5_15

Matthew L. Jockers⁸ &
Rosamond Thalken⁹

Part of the book series: Quantitative Methods in the Humanities and Social Sciences ((QMHSS))

3988 Accesses

Abstract

This chapter moves readers from the analysis of one or two texts to a larger corpus. Machine clustering is introduced in the context of an authorship attribution problem, and we reuse some functions developed in previous chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Enter ?regex at the prompt to learn more about regex in R.
2.
For a brief overview of how this work is conducted, see Jockers (2013), pp. 63–67.
3.
Using this formula, the file names become the first column in the resulting matrix and the tokens become the column headers. If, instead, we wanted to “transpose the matrix” and have the file names as the column headers, we would just change the formula to read token + file instead of file + token.
4.
All these books are by Irish or Irish-American authors. They were digitized and encoded into TEI by Matthew Jockers as part of his work on the Irish-American West project at Stanford back in the early 2000s.

References

Jockers ML (2013) Macroanalysis: Digital Methods and Literary History, 1st edn. University of Illinois Press, Urbana
Book Google Scholar

Download references

Author information

Authors and Affiliations

College of Arts and Sciences, Washington State University, Pullman, WA, USA
Matthew L. Jockers
Digital Technology and Culture Program, Washington State University, Pullman, WA, USA
Rosamond Thalken

Authors

Matthew L. Jockers
View author publications
You can also search for this author in PubMed Google Scholar
Rosamond Thalken
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

L. Jockers, M., Thalken, R. (2020). Clustering. In: Text Analysis with R. Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-39643-5_15

Download citation

DOI: https://doi.org/10.1007/978-3-030-39643-5_15
Published: 31 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39642-8
Online ISBN: 978-3-030-39643-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics