DESPOTA: An Algorithm to Detect the Partition in the Extended Hierarchy of a Dendrogram

Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 227)

Abstract

DESPOTA is a method proposed to seek the best partition among the ones hosted in a dendrogram. The algorithm visits nodes from the tree root toward the leaves. At each node, it tests the null hypothesis that the two descending branches sustain only one cluster of units through a permutation test approach. At the end of the procedure, a partition of the data into clusters is returned. This paper focuses on the interpretation of the test statistic using a data–driven approach, exploiting a real dataset to show the details of the test statistic and the algorithm in action. The working principle of DESPOTA is shown in the light of the Lance–Williams recurrence formula, which embeds all types of agglomeration methods.

References

  1. 1.
    Bruzzese, D., Vistocco, D.: DESPOTA: DEndrogram slicing through a pemutation test approach. J. Classif. 32(2), 285–304 (2015)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Cormack, R.M.: A review of classification. J, R. Stat. Soc. Ser. A (General) 134(3), 321–367 (1971)Google Scholar
  3. 3.
    Everitt, B., Landau, M., Leese, M.: Cluster Analysis, 4th edn. Arnold, London (2001)MATHGoogle Scholar
  4. 4.
    Gandy, A.: Sequential implementation of monte carlo tests with uniformly bounded resampling risk. J. Am. Stat. Assoc. 104(88), 1504–1511 (2009)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Good, P.I.: Permutations Tests for Testing Hypotheses. Springer, New York (1994)CrossRefMATHGoogle Scholar
  6. 6.
    Gordon, A.D.: Classification, 2nd edn. Chapman & Hall/CRC Press (1999)Google Scholar
  7. 7.
    Gurrutxaga, I., Albisua, I., Arbelaitz, O., Martìn, J.I., Muguerza, J., Pèrez, J.M., Perona, I.: SEP/COP: an efficient method to find the best partition in hierarchical clustering based on a new cluster validity index. Pattern Recogn. 43(10), 3364–3373 (2010)CrossRefMATHGoogle Scholar
  8. 8.
    Kaufman, L., Rousseeuw, P.J.: Finding groups in data. In: An Introduction to Cluster Analysis. Wiley. New York (1990)Google Scholar
  9. 9.
    Lance, G.N., Williams, W.T.: A generalised sorting strategy for computer classifications. Nature 212, 218 (1966b)CrossRefGoogle Scholar
  10. 10.
    Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. 1. Hierarchical systems. Comput. J. 9(4), 373–380 (1967)CrossRefGoogle Scholar
  11. 11.
    Lichman, M.: UCI machine learning repository. Irvine, CA: University of California, School of Information and Computer Science. http://archive.ics.uci.edu/ml (2013)
  12. 12.
    Milligan, G.W.: A Monte Carlo study of thirty internal criterion measures for cluster analysis. Psychometrika 42, 187–199 (1981)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Milligan, G.W., Cooper, M.C.: An examination of procedures for determining the number of clusters in a dataset. Psychometrika 52(2), 159–179 (1985)CrossRefGoogle Scholar
  14. 14.
    Pesarin, F., Salmaso, L.: Permutation tests for complex data. In: Theory, Applications and Software. Wiley, Chichester, UK (2010)Google Scholar
  15. 15.
    R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/ (2015)
  16. 16.
    Romano, J.P., Wolf, M.: Control of generalized error rates in multiple testing. Ann. Stat. 35(4), 1378–1408 (2007)MathSciNetCrossRefMATHGoogle Scholar
  17. 17.
    Rousseeuw, P.J.: Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J. Comput. Appl. Math. 20, 53–65 (1986)CrossRefMATHGoogle Scholar
  18. 18.
    Tibshirani, R., Walther, G., Hastie, T.: Estimating the number of clusters in a data set via the gap statistic. J. R. Stat. Soc. B 83(2), 411–423 (2001)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Ward Jr., J.H.: Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236–244 (1963)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Dip.to di Economia e Giurisprudenza – Università degli Studi di Cassino e del Lazio MeridionaleCassino (FR)Italy

Personalised recommendations