Abstract
Multivariate classification of variables into mathematically definable and homogenous subsets is often a useful first step in pattern recognition prior to formal statistical analyses of data sets. One such methodology, cluster analysis, has the main goal of clustering entities that share common characteristics and data structure. For example, one goal of such an analysis is to gain insight into the variables that are important in determining group membership so that new data can be easily classified; additionally, one may wish to develop subsets of data that share certain characteristics to facilitate statistical analysis of variables that are hypothesized to be related to clustered entities. As such, cluster analysis can be useful when applied to neuropsychological variables, particularly when an empirical statistical approach to classification is desirable or when significant interindividual differences in neuropsychological function exist within clinical populations. The goal of this chapter is to provide a review of clustering methods, including hierarchical agglomerative methods and iterative partitioning methods. Recommendations for determining the appropriate number of clusters and for comparing clustering methods also will be discussed. Further, validation techniques will be addressed. The chapter will conclude with a discussion of data issues commonly encountered in neuropsychological research, such as non-normality of data and incomplete data records from patients, and techniques for handling these situations. Finally, a table outlining various statistical packages is provided for the reader interested in the availability of software for computation.
The views presented in this chapter are those of the author(s) and do not necessarily represent the views of the US Department of Veterans Affairs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Allen, D. N., Goldstein, G., & Warnick, E. (2003). A consideration of neuropsychologically normal schizophrenia. Journal of the International Neuropsychological Society, 9, 56–63.
Allen, D. N., Leany, B. D., Thaler, N. S., Cross, C., Sutton, G. P., & Mayfield, J. (2010). Memory and attention profiles in pediatric traumatic brain injury. Archives of Clinical Neuropsychology, 25, 618–633.
Bacher, J., Wenzig, K., & Vogler, M. (2004). SPSS TwoStep cluster—a first evaluation. Retrieved February 15, 2008, from http://www.statisticalinnovations.com/products/twostep.pdf
Beale, E. M. L. (1969). Euclidean cluster analysis. Bulletin of the International Statistical Institute: Proceedings of the 37th Session (London), Book 2 (pp. 92–94). Voorburg, The Netherlands: ISI.
Betz, N. E. (1987). Use of discriminant analysis in counseling psychology research. Journal of Counseling Psychology, 34, 393–403.
Bulger, D. A., Matthews, R. A., & Hoffman, M. E. (2007). Work and personal life boundary management: Boundary strength, work/personal life balance, and the segmentation-integration continuum. Journal of Occupational Health Psychology, 12, 365–375.
Burnham, K. P., & Anderson, D. (2002). Model selection and multi-model inference: A practical information-theoretic approach (2nd ed.). New York: Springer.
Calinski, R. B., & Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics, 3, 1–27.
Chavent, M., Ding, Y., Fu, L., Stolowy, H., & Wang, H. (2006). Disclosure and determinants studies; an extension using the division clustering method (DIV). European Accounting Review, 15, 181–218.
Cross, C. L., & Petersen, C. E. (2001). Modeling snake microhabitat from radiotelemetry studies using polytomous logistic regression. Journal of Herpetology, 35, 590–597.
Donoghue, J. R. (1995). Univariate screening measures for cluster analysis. Mutivariate Behavioral Research, 30, 385–427.
Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. SIAM CBMS-NSF Monographs, 28.
Eisen, M. B., Spellman, P. T., Brown, P. O., & Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences, 95, 14863–14868.
Everitt, B. S., Landau, S., Leese, M., & Stahl, D. (2011). Cluster analysis (5th ed.). New York: Wiley.
Glasø, L., Matthiesen, S. B., Nielsen, M. B., & Ståle, E. (2007). Do targets of workplace bullying portray a general victim personality profile? Scandinavian Journal of Psychology, 48, 313–319.
Goldstein, G. (1990). Neuropsychological heterogeneity in schizophrenia: A consideration of abstraction and problem-solving abilities. Archives of Clinical Neuropsychology, 5, 251–264.
Gower, J. C. (1971). A general coefficient of similarity and some of its properties. Biometrics, 27, 857–872.
Gower, J. C., & Legendre, P. (1986). Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification, 5, 5–48.
Halsell, J. N. (2007). Using cluster analysis to evaluate the academic performance of demographic homogeneous subsets. Unpublished doctoral dissertation, University of Nevada, Las Vegas, Nevada.
Heinrichs, R. W., & Awad, A. G. (1993). Neurocognitive subtypes of chronic schizophrenia. Schizophrenia Research, 9, 49–58.
Hill, S. K., Ragland, J. D., Gur, R. C., & Gur, R. E. (2002). Neuropsychological profiles delineate distinct profiles of schizophrenia, an interaction between memory and executive function, and uneven distribution of clinical subtypes. Journal of Clinical and Experimental Neuropsychology, 24, 2002.
Hosmer, D. W., & Lemeshow, S. (2001). Applied logistic regression (2nd ed.). New York: Wiley.
Huff, D. (1954). How to lie with statistics. New York: W. W. Norton.
Ichino, M., & Yaguchi, H. (1994). Generalized Minkowski metrics for mixed feature-type data analysis. IEEE Transactions on Systems, Man and Cybernetics, 24, 698–708.
Jiang, D., Tang, C., & Zhang, A. (2004). Cluster analysis for gene expression data: A survey. IEEE Transactions on Knowledge and Data Engineering, 16, 1370–1386.
Johnson, R. A., & Wichern, D. W. (2007). Applied multivariate statistical analysis (6th ed.). Upper Saddle River, NJ: Pearson.
Lance, G. N., & Williams, W. T. (1967). A general theory of classification sorting strategies: 1. Hierarchical systems. Computer Journal, 9, 373–380.
Libon, D. J., Schwartzman, R. J., Eppig, J., Wambach, D., Brahin, E., Peterlin, B. L., et al. (2010). Neuropsychological deficits associated with complex regional pain syndrome. Journal of the International Neuropsychological Society, 16, 566–573.
Lumley, T. (2001). Orca [R [RJava]]. Proceedings of the 2nd International Workshop on Distributed Statistical Computing, Vienna, Austria. Available online at http://www.ci.tuwien.ac.at/Conferences/DSC-2001/Proceedings/Lumley.pdf
Milligan, G. W., & Cooper, M. C. (1985). An examination of procedures for determining the number of clusters in a data set. Psychometrika, 50, 159–179.
Morris, R., Blashfield, R., & Satz, P. (1981). Neuropsychology and cluster analysis: Potentials and problems. Journal of Clinical Neuropsychology, 3, 79–99.
Myers, R. E., III, & Fouts, J. T. (1992). A cluster analysis of high school science classroom environment and attitude toward science. Journal of Research in Science Teaching, 29, 929–937.
Palmer, B. W., Dawes, S. W., & Heaton, R. K. (2009). What do we know about neuropsychological aspects of schizophrenia? Neuropsychology Review, 19, 365–384.
Pearson, K. (1920). Notes on the history of correlation. Biometrika, 13, 25–45.
Peters, K. R., Graf, P., Hayden, S., & Feldman, H. (2005). Neuropsychological subgroups of cognitively-impaired-not-demented (CIND) individuals: Delineation, reliability, and predictive validity. Journal of Clinical and Experimental Neuropsychology, 27, 164–188.
Punj, G., & Stewart, D. W. (1983). Cluster analysis in marketing research: Review and suggestions for application. Journal of Marketing Research, 20, 134–148.
Rodgers, J. L., & Nicewander, A. (1988). Thirteen ways to look at the correlation coefficient. The American Statistician, 42, 59–66.
Rogers, T. T., Ralph, M. A. L., Garrard, P., Bozeat, S., McClelland, J. L., Hodges, J. R., et al. (2004). Structure and deterioration of semantic memory: A neuropsychological and computational investigation. Psychological Review, 111, 205–235.
Sarkar, D. (2008). Lattice: Multivariate visualization with R. New York: Springer.
Sarle, W. S. (1983). The cubic cluster criterion. SAS Technical Report A-108. Cary, NC: SAS Institute.
Seaton, B. E., Goldstein, G., & Allen, D. (2001). Sources of heterogeneity in schizophrenia: The role of neuropsychological functioning. Neuropsychological Review, 11, 45–67.
Sharp, H. (1968). Cardinality of finite topologies. Journal of Combinatorial Theory, 5, 82–86.
Tabachnick, B. G., & Fidell, L. S. (2007). Using multivariate statistics, 5th ed. Boston, MA: Allyn and Bacon.
Tajima, F. (1993). Unbiased estimation of evolutionary distance between nucleotide sequences. Molecular Biology and Evolution, 10, 677–688.
Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., & Kumar, S. (2011). MEGA5: Molecular evolutionary genetics analysis using maximum likelihood, evolutionary distances, and maximum parsimony methods. Molecular Biology and Evolution, 28, 2731–2739.
Thaler, N. S., Bellow, D. T., Randall, C., Goldstein, G., Mayfield, J., & Allen, D. N. (2010). IQ profiles are associated with differences in behavioral functioning following pediatric traumatic brain injury. Archives of Clinical Neuropsychology, 25, 781–790.
Timm, N. H. (2002). Applied multivariate statistics. New York: Springer.
Wallace, L., Keil, M., & Rai, A. (2004). Understanding software project risk: A cluster analysis. Information and Management, 42, 115–125.
Ward, J. H. (1963). Hierarchical groupings to optimize an objective function. Journal of the American Statistical Association, 58, 236–244.
Wessman, J., Paunio, T., Tuulio-Henriksson, A., Koivisto, M., Partonen, T., Suvisaan, J., et al. (2009). Mixture model clustering of phenotype features reveals evidence for association of DTNBP1 to a specific subtype of schizophrenia. Biological Psychiatry, 66, 990–996.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Science+Business Media New York
About this chapter
Cite this chapter
Cross, C.L. (2013). Statistical and Methodological Considerations When Using Cluster Analysis in Neuropsychological Research. In: Allen, D., Goldstein, G. (eds) Cluster Analysis in Neuropsychological Research. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6744-1_2
Download citation
DOI: https://doi.org/10.1007/978-1-4614-6744-1_2
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-6743-4
Online ISBN: 978-1-4614-6744-1
eBook Packages: Behavioral ScienceBehavioral Science and Psychology (R0)