Abstract
This chapter will introduce concepts and techniques for using unstructured text as a data source. We will first review examples of the types of extant text data that you may encounter. We will then discuss the process of turning that text into a data source more amenable to the types of quantitative analysis that we are likely to perform.
Electronic Supplementary Material The online version of this chapter (https://doi.org/10.1007/978-3-030-36826-5_14) contains supplementary material, which is available to authorized users.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Part of the reason that they can be a bad signal is due to URL shorteners obfuscating the final destination.
- 3.
- 4.
- 5.
- 6.
- 7.
If you wish to replicate this study exactly, the CSV version of the dataset is included with the book at https://dataverse.harvard.edu/dataverse/python-book.
- 8.
- 9.
References
Lewis, J. B., Poole, K., Rosenthal, H., Boche, A., Rudkin, A., & Sonnet, L. (2017). Voteview: Congressional roll-call votes database. https://voteview.com/
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv:1301.3781
Reese, D. (2012). Is Sen. Claire McCaskill a moderate? The Washington Post. Retrieved August 22, 2013.
Spärck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.
van der Maaten, L., & Hinton, G. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Cutler, J., Dickenson, M. (2020). Case Study: Natural Language Processing. In: Computational Frameworks for Political and Social Research with Python. Textbooks on Political Analysis. Springer, Cham. https://doi.org/10.1007/978-3-030-36826-5_14
Download citation
DOI: https://doi.org/10.1007/978-3-030-36826-5_14
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-36825-8
Online ISBN: 978-3-030-36826-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)