Cluster Analysis

Mooi, Erik; Sarstedt, Marko; Mooi-Reci, Irma

doi:10.1007/978-981-10-5218-7_9

Erik Mooi⁴,
Marko Sarstedt⁵ &
Irma Mooi-Reci⁶

Part of the book series: Springer Texts in Business and Economics ((STBE))

159k Accesses
8 Citations

Abstract

We provide comprehensive and advanced knowledge of cluster analysis knowledge. We first introduce the principles of cluster analysis and outline the steps and decisions involved. We discuss how to select appropriate clustering variables and subsequently introduce modern hierarchical and partitioning methods for cluster analysis, using simple examples to illustrate how they work. We also discuss the key measures of similarity and dissimilarity, and offer guidance on how to decide the number of clusters to extract from the data. Each step in a cluster analysis is subsequently linked to its execution in Stata (using menus and code), thus enabling readers to analyze, chart, and validate the results. Interpretation of Stata output can be difficult, but we make this easier by means of an annotated case study. We conclude with suggestions for further readings on the use, application, and interpretation of cluster analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Tonks (2009) provides a discussion of segment design and the choice of clustering variables in consumer markets.
2.
See Arabie and Hubert (1994), Sheppard (1996), and Dolnicar and Grün (2009).
3.
Whereas agglomerative methods have the large task of checking N·(N−1)/2 possible first combinations of observations (note that N represents the number of observations in the dataset), divisive methods have the almost impossible task of checking 2^(N−1)−1 combinations.
4.
There are many other matching coefficients, such as Yule’s Q, Kulczynski, or Ochiai, which are also menu-accessible in Stata. However, since most applications of cluster analysis rely on metric or ordinal data, we will not discuss these. See Wedel and Kamakura (2000) for more information on alternative matching coefficients.
5.
For details on the implementation of these stopping rules in Stata, see Halpin (2016).
6.
In the Web Appendix (→Downloads), we offer a Stata.ado file to calculate the ω _k called chomega.ado. We also offer an Excel sheet (VRC.xlsx) to calculate the ω _k manually.
7.
See Punj and Stewart (1983) for additional information on this sequential approach.

References

Anderberg, M. R. (1973). Cluster analysis for applications. New York: Academic.
Google Scholar
Arabie, P., & Hubert, L. (1994). Cluster analysis in marketing research. In R. P. Bagozzi (Ed.), Advanced methods in marketing research (pp. 160–189). Cambridge: Basil Blackwell & Mott, Ltd..
Google Scholar
Arthur, D., & Vassilvitskii, S. (2007). k-means++: The advantages of careful seeding. In Proceedings of the eighteenth annual ACM-SIAM symposium on discrete algorithms (pp. 1027–1035). Philadelphia: Society for Industrial and Applied Mathematics.
Google Scholar
Becker, J.-M., Ringle, C. M., Sarstedt, M., & Völckner, F. (2015). How collinearity affects mixture regression results. Marketing Letters, 26(4), 643–659.
Article Google Scholar
Caliński, T., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics—Theory and Methods, 3(1), 1–27.
Article Google Scholar
Dolnicar, S. (2003). Using cluster analysis for market segmentation—typical misconceptions, established methodological weaknesses and some recommendations for improvement. Australasian Journal of Market Research, 11(2), 5–12.
Article Google Scholar
Dolnicar, S., & Grün, B. (2009). Challenging “factor-cluster segmentation”. Journal of Travel Research, 47(1), 63–71.
Article Google Scholar
Dolnicar, S., & Lazarevski, K. (2009). Methodological reasons for the theory/practice divide in market segmentation. Journal of Marketing Management, 25(3–4), 357–373.
Article Google Scholar
Dolnicar, S., Grün, B., Leisch, F., & Schmidt, F. (2014). Required sample sizes for data-driven market segmentation analyses in tourism. Journal of Travel Research, 53(3), 296–306.
Article Google Scholar
Dolnicar, S., Grün, B., & Leisch, F. (2016). Increasing sample size compensates for data problems in segmentation studies. Journal of Business Research, 69(2), 992–999.
Article Google Scholar
Duda, R. O., & Hart, P. E. (1973). Pattern classification. Hoboken: Wiley.
Google Scholar
Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern classification (2^nd ed.). Hoboken: Wiley.
Google Scholar
Everitt, B. S., & Rabe-Hesketh, S. (2006). Handbook of statistical analyses using Stata (4^th ed.). Boca Raton: Chapman & Hall/CRC.
Google Scholar
Formann, A. K. (1984). Die Latent-Class-Analyse: Einführung in die Theorie und Anwendung. Beltz: Weinheim.
Google Scholar
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27(4), 857–871.
Article Google Scholar
Halpin, B. (2016). Cluster analysis stopping rules in Stata. University of Limerick. Department of Sociology Working Paper Series, WP2016-01. http://ulsites.ul.ie/sociology/sites/default/files/wp2016-01.pdf
Kaufman, L., & Rousseeuw, P. J. (2005). Finding groups in data. An introduction to cluster analysis. Hoboken: Wiley.
Google Scholar
Kotler, P., & Keller, K. L. (2015). Marketing management (15^th ed.). Upper Saddle River: Prentice Hall.
Google Scholar
Milligan, G. W., & Cooper, M. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50(2), 159–179.
Article Google Scholar
Milligan, G. W., & Cooper, M. (1988). A study of variable standardization. Journal of Classification, 5(2), 181–204.
Article Google Scholar
Park, H.-S., & Jun, C.-H. (2009). A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 36(2), 3336–3341.
Article Google Scholar
Punj, G., & Stewart, D. W. (1983). Cluster analysis in marketing research: Review and suggestions for application. Journal of Marketing Research, 20(2), 134–148.
Google Scholar
Qiu, W., & Joe, H. (2009). Cluster generation: Random cluster generation (with specified degree of separation). R package version 1.2.7.
Google Scholar
Sheppard, A. (1996). The sequence of factor analysis and cluster analysis: Differences in segmentation and dimensionality through the use of raw and factor scores. Tourism Analysis, 1(1), 49–57.
Google Scholar
Tonks, D. G. (2009). Validity and the design of market segments. Journal of Marketing Management, 25(3/4), 341–356.
Article Google Scholar
Wedel, M., & Kamakura, W. A. (2000). Market segmentation: Conceptual and methodological foundations (2^nd ed.). Boston: Kluwer Academic.
Google Scholar
van der Kloot, W. A., Spaans, A. M. J., & Heinser, W. J. (2005). Instability of hierarchical cluster analysis due to input order of the data: The PermuCLUSTER solution. Psychological Methods, 10(4), 468–476.
Article Google Scholar
Lilien, G. L., & Rangaswamy, A. (2004). Marketing engineering. Computer-assisted marketing analysis and planning (2^nd ed.). Bloomington: Trafford Publishing.
Google Scholar
John H. R., Kayande, U., & Stremersch, S. (2014). From academic research to marketing practice: Exploring the marketing science value chain. International Journal of Research in Marketing, 31(2), 127–140
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Management and Marketing, University of Melbourne, Parkville, Victoria, Australia
Erik Mooi
Chair of Marketing, Otto-von-Guericke-University, Magdeburg, Sachsen-Anhalt, Germany
Marko Sarstedt
School of Social and Political Sciences, University of Melbourne, Parkville, Victoria, Australia
Irma Mooi-Reci

Authors

Erik Mooi
View author publications
You can also search for this author in PubMed Google Scholar
Marko Sarstedt
View author publications
You can also search for this author in PubMed Google Scholar
Irma Mooi-Reci
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mooi, E., Sarstedt, M., Mooi-Reci, I. (2018). Cluster Analysis. In: Market Research. Springer Texts in Business and Economics. Springer, Singapore. https://doi.org/10.1007/978-981-10-5218-7_9

Download citation

DOI: https://doi.org/10.1007/978-981-10-5218-7_9
Published: 02 November 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-5217-0
Online ISBN: 978-981-10-5218-7
eBook Packages: Business and ManagementBusiness and Management (R0)

Publish with us

Policies and ethics