Cluster Analysis: Modeling Groups in Text

  • Murugan Anandarajan
  • Chelsey Hill
  • Thomas Nolan
Part of the Advances in Analytics and Data Science book series (AADS, volume 2)


This chapter explains the unsupervised learning method of grouping data known as cluster analysis. The chapter shows how hierarchical and k-means clustering can place text or documents into significant groups to increase the understanding of the data. Clustering is a valuable tool that helps us find naturally occurring similarities.


Cluster analysis Hierarchical cluster analysis k-means cluster analysis k-means Single linkage Complete linkage Centroid Ward’s method 


  1. Aggarwal, C. C., & Zhai, C. X. (2012). Mining text data. New York: Springer Verlag.CrossRefGoogle Scholar
  2. Berry, M. J., & Linoff, G. S. (2011). Data mining techniques. For marketing, sales, and customer relationship management. Chichester: Wiley-Interscience.Google Scholar
  3. Halkidi, M., Batistakis, Y., & Vazirgiannis, M. (2002). Cluster validity methods: Part II. ACM SIGMOD Record, 31(2), 40–45.CrossRefGoogle Scholar
  4. Jain, A. K., & Dubes, R. C. (1988). Algorithms for clustering data. Upper Saddle River: Prentice-Hall.Google Scholar
  5. Jain, A. K., Murty, M. N., & Flynn, P. J. (1999). Data clustering: A review. ACM Computing Surveys (CSUR), 31(3), 264–323.CrossRefGoogle Scholar
  6. Jain, A. K., Duin, R. P. W., & Mao, J. (2000). Statistical pattern recognition: A review. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(1), 4–37.CrossRefGoogle Scholar
  7. Manning, C., Raghavan, P., & Schütze, H. (2008). Introduction to Information Retrieval. Cambridge: Cambridge University Press. doi: 10.1017/CBO9780511809071.
  8. Murtagh, F. (1983). A survey of recent advances in hierarchical clustering algorithms. The Computer Journal, 26(4), 354–359.CrossRefGoogle Scholar
  9. Nagy, G. (1968). State of the art in pattern recognition. Proceedings of the IEEE, 56(5), 836–863.CrossRefGoogle Scholar
  10. Rousseeuw, P. J., & Kaufman, L. (1990). Finding groups in data. Hoboken: Wiley Online Library.Google Scholar
  11. Steinley, D. (2006). K-means clustering: A half-century synthesis. British Journal of Mathematical and Statistical Psychology, 59(1), 1–34.CrossRefGoogle Scholar
  12. Voorhees, E. M. (1986). Implementing agglomerative hierarchic clustering algorithms for use in document retrieval. Information Processing & Management, 22(6), 465–476.CrossRefGoogle Scholar
  13. Ward, J. H., Jr. (1963). Hierarchical grouping to optimize an objective function. Journal of the American Statistical Association, 58(301), 236–244.CrossRefGoogle Scholar
  14. Xu, R., & Wunsch, D. (2005). Survey of clustering algorithms. IEEE Transactions on neural networks, 16(3), 645–678.CrossRefGoogle Scholar
  15. Zaki, M. J., Meira, W., Jr., & Meira, W. (2014). Data mining and analysis: Fundamental concepts and algorithms. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  16. Zhao, Y., Karypis, G., & Fayyad, U. (2005). Hierarchical clustering algorithms for document datasets. Data Mining and Knowledge Discovery, 10(2), 141–168. Scholar

Further Reading

  1. For more about clustering, see Berkhin (2006), Jain and Dubes (1988) and Jain et al. (1999).Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Murugan Anandarajan
    • 1
  • Chelsey Hill
    • 2
  • Thomas Nolan
    • 3
  1. 1.LeBow College of BusinessDrexel UniversityPhiladelphiaUSA
  2. 2.Feliciano School of BusinessMontclair State UniversityMontclairUSA
  3. 3.Mercury Data ScienceHoustonUSA

Personalised recommendations