Building a Classification Tree

Lerman, Israël César

doi:10.1007/978-1-4471-6793-8_10

Israël César Lerman¹³

Part of the book series: Advanced Information and Knowledge Processing ((AI&KP))

1621 Accesses

Abstract

The basic data consists of a finite set E provided with a dissimilarity or a similarity function. The elements of E are of the same nature. As seen in Chap. 3, E can be a set of attributes, a set of objects or a set of categories. Tables 3.4 and 3.5 of Chap. 3 give a precise idea of the possible different versions of E. On the other hand, the set E may be weighted by a positive numerical measure \(\mu _{E}\), assigning to each of its elements x (\(x \in E\)) a weight \(\mu _{x}\). \(\mu _{x}\) defines the “importance” with which x has to be considered. The dissimilarity or similarity function on E is mostly numerical. However, it might be ordinal. In Chaps. 4–7, facets of building a similarity index or association coefficient on E have been minutely examined in relation to the descriptive nature of E.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Hardcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Two total preorders \(\omega \) and \(\omega '\) on a given set are comparable if the graph of one of both total preorders is included in the other one.

References

Abbassi, N.: Identification de familles protéiques. Master 2, Research report, IRISA-INRIA, June 2013
Google Scholar
Aczel, J., Forte, B., Ng, C.T.: Why the Shannon and Hartley entropies are “natural”. Adv. Appl. Probab. 6, 131–146, Printed in Israel (1974)
Google Scholar
Bachar, K., Lerman, I.-C.: Statistical conditions for an algorithm of hierarchical classification under constraint of contiguiuty. In: Rizzi, A., Bock, H.-H., Vichi, M. (eds.) Advances in Data Science and Classification, pp. 131–136. Springer, Berlin (1998)
Google Scholar
Bachar, K., Lerman, I.-C.: Fixing parameters in the constrained hierarchical classification method: application to digital image segmentation. In: Banks, D., et al. (eds.) Classification Clustering and Data Mining Applications, pp. 85–94. Springer, New York (2004)
Google Scholar
Benzécri, J.-P.: Construction d’une classification ascendante hiérarchique par la recherche en chaîne des voisins réciproques. Les Cahiers de l’Analyse des Données 7, 209–218 (1982)
Google Scholar
Benzécri, J.-P.: L’analyse des données, tomes I et II. Dunod (1973)
Google Scholar
Bock, H.H.: Automatische Klassifikation Vandenhoeck und Rupprecht. Gottingen (1974)
Google Scholar
Bruynooghe, M.: Classification ascendante hiérarchique des grands ensembles de données; un algorithme rapide fondé sur la construction des voisinages réductibles. Cahiers de l’Analyse des Données III(1), 7–33 (1978)
Google Scholar
Bruynooghe, M.: Nouveaux algorithmes en classification automatique applicables aux très grands ensembles de données, rencontrés en traitement d’images et en reconnaissance de formes. Ph.D. thesis, Thèse d’Etat, Université de Paris 6, Jan 1989
Google Scholar
Bruynooghe, M.: Recent results in hierarchical clustering. I—the reducible neighborhoods clustering algorithm. Int. J. Pattern Recognit. Artif. Intell. 7(3), 541–571 (1993)
Google Scholar
Costa Nicolau, F., Bacelar-Nicolau, H.: Some trends in the classification of variables.In: Hayashi, C., et al. (eds.) Data Science, Classification and Related Methods, pp. 89–98.Springer, New York (1998)
Google Scholar
Cutting, D., Karger, D., Pederson, J., Tukey, J.: Scatter/gather: a cluster-based approach to browsing large document collections. In: Belkin, N., et al. (eds.) International ACM SIGIR Conference on Research and Development in Information Retrevial, pp. 318–339. ACM Press (1992)
Google Scholar
de Rham, C.: La classification hiérarchique ascendante selon la méthode des voisins réciproques. Les Cahiers de l’Analyse des Données V(2), 135–144 (1980)
Google Scholar
Diday, E.: Inversions en classification hiérarchique: application à la construction adaptative d’indices d’agrégation. Revue de Statistique Appliquée 31(1), 45–62 (1983)
Google Scholar
Edwards, W.F., Cavalli-Sforza, L.L.: A method for cluster analysis. Biometrics 21, 363–375 (1965)
Article Google Scholar
Feller, W.: An Introduction to Probability Theory and Its Applications, vol. II, 2nd edn. Wiley, New York (1971)
MATH Google Scholar
Florek, K.J., Lukaszewick, J., Perkal, J., Steinhaus, H., Zubrzycki, S.: Sur la liaison et la division des points d’un ensemble fini. In: Colloquium Mathematics, vol. 2, pp. 282–285 (1951)
Google Scholar
Florek, K.J., Lukaszewick, J., Perkal, J., Steinhaus, H., Zubrzycki, S.: Taksonomia wroclawska (in polish with english summary). Przegl. antrop. 17, 193–217 (1951)
Google Scholar
Ghazzali, N.: Comparaison et réduction d’arbres de classification, en relation avec des problèmes de quantification en imagerie numérique. Ph.D. thesis, Université de Rennes 1, May 1992
Google Scholar
Ghazzali, N., Léger, A., Lerman, I.C.: Rôle de la classification statistique dans la compression du signal d’image: Panoram et étude spécifique de cas. La Revue de Modulad 14, 51–91 (1994)
Google Scholar
Gordon, A.D.: A review of hierarchical classification. J. R. Stat. Soc. 150(2), 119–137 (1987)
MathSciNet MATH Google Scholar
Govaert, G.: La classification croisée. La Revue de Modulad 4, 9–36 (1989)
Google Scholar
Gras, R., Larher, A.: L’implication statistique, une nouvelle méthode d’analyse des données. Mathématiques, Informatique et Sciences Humaines 120, 5–31 (1993)
Google Scholar
Jambu, M.: Classification Automatique pour l’Analyse des Données. Dunod (1978)
Google Scholar
Jardine, N., Sibson, R.: Mathematical Taxonomy. Wiley, New York (1971)
MATH Google Scholar
Kojadinovic, I.: Hierarchical clustering of continuous variables based on the empirical copula process and permutation linkages. Comput. Stat. Data Anal. 54, 90–108 (2010)
Article MathSciNet MATH Google Scholar
Lance, G.N., Williams, W.T.: A general theory of classificatory sorting strategies. I. Hierarchical systems. Comput. J. 9, 373–380 (1967)
Article Google Scholar
Lee, A., Willcox, B.: Minkowski generalizations of Ward’s method in hierarchical clustering. J. Classif. 31, 194–218 (2014)
Article MathSciNet Google Scholar
Leredde, H.: La méthode des pôles d’attraction; La méthode des pôles d’agrégation : deux nouvelles familles d’algorithmes en classification automatique et sériation. Ph.D. thesis, Université de Paris 6, Oct 1979
Google Scholar
Lerman, I.-C., Bachar, K.: Contruction et justification d’une méthode de classification ascendante hiérarchique accélérée fondée fondée sur le critère de la vraisemblance du lien en cas de données de contiguïté. application en imagerie numérique. Publication Interne 1616, IRISA-INRIA, April 2004
Google Scholar
Lerman, I.-C., Bachar, K.: Comparaison de deux critères en classification ascendante hiérarchique sous contrainte de contiguïté. application en imagerie numérique. Journal de la Société Française de Statistique 149(2), 45–74 (2008)
Google Scholar
Lerman, I.-C., Peter, Ph.: Analyse d’un algorithme de classification hiérarchique en parallèle pour le traitement de gros ensembles, aspects méthodologiques et programmation. Publication Interne IRISA et Rapport de Recherche INRIA 232, IRISA-INRIA, Aug 1984
Google Scholar
Lerman, I.C.: Les bases de la classification automatique. Gauthier-Villars (1970)
Google Scholar
Lerman, I.C.: Analyse du phénomène de la “sériation”. Revue mathématique et Sciences Humaines 38 (1972)
Google Scholar
Lerman, I.C.: Classification et analyse ordinale des données. Dunod. http://www.brclasssoc.org.uk/books/index.html (1981)
Lerman, I.C.: Formules de réactualisation en cas d’agrégations multiples. Recherche Opérationnele, Operations Research 23(2), 151–163 (1989)
Google Scholar
Lerman, I.C.: Foundations of the likelihood linkage analysis (LLA) classification method. Appl. Stoch. Models Data Anal. 7, 63–76 (1991)
Article MathSciNet MATH Google Scholar
Lerman, I.C.: Likelihood linkage analysis (LLA) classification method: an example treated by hand. Biochimie 75, 379–397 (1993)
Article Google Scholar
Lerman, I.C.: Analyse logique, combinatoire et statistique de la construction d’une hiérarchie binaire; niveaux et noeuds significatifs. Mathématiques et Sciences Humaines, Mathematics and Social Sciences 184, 47–103 (2008)
Google Scholar
Lerman, I.C., Kuntz, P.: Directed binary hierarchies and directed ultrametrics. J. Classif. 28, 272–296 (2011)
Article MathSciNet Google Scholar
Lerman, I.C., Leredde, H.: La méthode des pôles d’attraction. In: Diday, E., et al. (eds.) Journées Analyse des Données et Informatique. IRIA (1977)
Google Scholar
Lerman, I.C., Peter, Ph.: Organisation et consultation d ’ une banque de petites annonces à partir d ’ une méthode de classification hiérarchique en parallèle. In: Data Analysis and Informatics IV, pp. 121–136. North Holland (1986)
Google Scholar
Loughin, T.M.: A systematic comparison of methods for combining \(p\)-values from independent tests. Comput. Stat. Data Anal. 47, 467–485 (2004)
Article MathSciNet MATH Google Scholar
McQuitty, L.L.: Similarity analysis by reciprocal pairs for discrete and continuous data. Educ. Psychol. Meas. 26, 825–831 (1966)
Google Scholar
McQuitty, L.L.: Elementary linkage analysis for isolating orthogonal and oblique types and typal relevancies. Educ. Psychol. Meas. 17, 207–229 (1957)
Google Scholar
Murtagh, F.: A survey of recent advances in hierarchical clustering. Comput. J. 26, 354–359 (1983)
Article MATH Google Scholar
Murtagh, F.: A survey of algorithms for contiguity constrained clustering and related problems. Comput. J. 28, 82–88 (1985)
Article Google Scholar
Murtagh, F.: Clustering massive data sets. In: Handbook of Massive Data Sets, pp. 501–543. Kluwer Academic Publishers, Norwell (2002)
Google Scholar
Nicolau, F.: Criterios de análise classificatoria hierarquica baseados na funçao de distribuiçao. Ph.D. thesis, Faculty of Science of Lisboa University, Feb 1981
Google Scholar
Olson, C.F.: Parallel algorithms for hierarchical clustering. Parallel Comput. 21, 1313–1325 (1995)
Article MathSciNet MATH Google Scholar
Orloci, L.: Information theory models for hierarchic and non-hierarchic classifications. In: Cole, A.J. (ed.) Numerical Taxonomy, pp. 148–164. Academic Press, New York (1968)
Google Scholar
Peter, Ph.: Méthodes de classification hiérarchique et problèmes de structuration et de recherche d ’ informations assistée par ordinateur. Ph.D. thesis, Université de Rennes 1, mars 1987
Google Scholar
Roux, M.: Deux algorithmes récents en classification automatique. Revue de Statistique Appliquée 18(4), 35–40 (1970)
Google Scholar
Scott, A.J., Symons, M.J.: On the Edwards and Cavalli Sforza method of cluster analysis. Biometrics 27, 217–219 (1971)
Article Google Scholar
Sneath, P.H.A.: The application of computers to taxonomy. J. Gen. Microbiol. 17, 201–226 (1957)
Article Google Scholar
Sneath, P.H.A., Sokal, R.R.: Numerical Taxonomy. Freeman, San Francisco (1973)
MATH Google Scholar
Sokal, R.R., Michener, C.D.: A statistical method for evaluating systematic relationships. Kansas Univ. Sci. Bull. 38, 1409–1438 (1958)
Google Scholar
Sorensen, T.: A method of establishing groups of equal amplitude in plant sociology based on similarity of species content. K. danske Vidensk. Selsk. Skr. (biol) 5, 1–34 (1948)
Google Scholar
Ward, J.H.: Hierarchical grouping to optimise an objective function. J. Am. Stat. Assoc. 58, 238–244 (1963)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Data Knowledge and Management, University of Rennes 1, IRISA, Rennes, Ille-et-Vilaine, France
Israël César Lerman

Authors

Israël César Lerman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Israël César Lerman .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lerman, I.C. (2016). Building a Classification Tree. In: Foundations and Methods in Combinatorial and Statistical Data Analysis and Clustering. Advanced Information and Knowledge Processing. Springer, London. https://doi.org/10.1007/978-1-4471-6793-8_10

Download citation

DOI: https://doi.org/10.1007/978-1-4471-6793-8_10
Published: 25 March 2016
Publisher Name: Springer, London
Print ISBN: 978-1-4471-6791-4
Online ISBN: 978-1-4471-6793-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics