Abstract
As most of the classifications man has constructed during a long period going from Plato and Aristotle to Linnaeus (and even to the XIXth century) were hierarchical classifications, we devote this complete chapter to the exposition of mathematical models of hierarchies. After some historical views on the subject (Sect. 3.2), we introduce (Sect. 3.3) the basic notions of partition, partition lattice, and chain of partitions, this last one being the exact mathematical model of a hierarchical classification, classically represented by a tree diagram. In Sect. 3.4, we give the structure of the set of all hierarchical classifications on a finite set (which allows us to know exhaustively all the possible hierarchies we can make). Then, in Sect. 3.5, we present the exact correspondence between tree diagram, hierarchical structure, and the distance we can define on it, which is an ultrametric. These ideal models and their algebraic representations (Sect. 3.6) are those that the taxonomists want generally to obtain, but the appearance of the real world, in general, is quite far from such nice orders. So, we explain (Sect. 3.7) how we can replace the empirical quasi-chaotic data with due mathematical taxonomies. Though, in many cases, we are able to do that and can easily adapt our numerical models to empirical data, some problems arise in this operation (Sect. 3.8): either because of the existence of mathematical limits within the models (intrinsic instability) or because of the presence of changes in human perception of the world in the course of time (extrinsic instability). However (Sect. 3.9), we list finally some possible answers to these important questions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
Prof. Stéphanie Ruphy, a former student of D. Parrochia in Toulouse, has shown that stellar kinds will not please the essentialist monism, because of the existence of continuous parameters, transitory properties, and several kinds to which a given star may belong, so that stellar kinds cannot be said “natural kinds”. See [432], 1109–1120: “Not only does the stellar world not come prepackaged with a unique set of objective, privileged (in an essentialist sense) divisions, but also it does not come prepackaged with objective divisions, tout court” (1118). But, on the opposite, stellar kinds will not please the pluralist embracing promiscuous realism. There exist in fact objective properties connected to the main aims of scientific research in Astrophysics. “Astrophysicists want to know how stars form, evolve, and disappear. Their theoretical understanding of the behavior of gaseous spheres tells them that parameters such as temperature, density, or mass loss are determinant parameters in stellar evolutionary processes, whereas proper motion or distance from the earth are not; hence, we have their choice of the former, and not the latter, as taxonomic parameters. In short, kind membership is conferred by structural properties central for explaining a large variety of stellar behaviors (and those properties translate into spectral features that are directly observable)” (1111).
- 3.
On the contrary, in the case of a similarity coefficient s, the objects e and e′ would have to be as much more similar as s(e,e′) gets a bigger value. On the definition of a dissimilarity coefficient, see Sect. 3.5.1.
- 4.
- 5.
- 6.
In a strict sense, a similarity relation S is a reflexive, symmetric, but not transitive relation.
- 7.
In fact, this assumption of empirical clustering may be questioned. In particular, Russell (see [436]) has shown that Keynes’ principle of limited variety (Treatise on probability, Chap. XXII, 258) was not valid, so that classification constraints cannot be explained by empirical observations but are to be founded on mathematical regularities.
- 8.
The question raised with the Russell’s paradox leads, beyond the problematic answer Russell himself has proposed with the theory of types, to the beginning of theories of universes with classes and sets. After the classical solution of Zermelo and Fraenkel (the ZF axiomatic), will come non-classical theories like those of Finsler, Kelley-Morse, Quine, Von Neumann, etc. Some interesting attempts of the French mathematician Claude Frasnay about new definitions of classes and sets must also be mentioned here (see [180]).
- 9.
Plato (Protagoras 331de), already said that “it is not fair to describe things as like which have some point alike, however small, or as unlike that have some point unlike”. Nelson Goodman goes further in the beginning of a well-known paper: “Similarity, I submit, is insidious. And if the association here with invidious comparison is itself invidious, so much the better. Similarity, ever ready to solve philosophical problems and overcome obstacles, is a pretender, an impostor, a quack. It has, indeed, its place and its uses, but is more often found where it does not belong, professing powers it does not possess.” (See [200], 437.) Of course, resemblance alone is not enough for representation, it may be superfluous in the case of descriptions of replicas of inscriptions or events, it does not explain metaphors and does not account for our predictive, or more generally, our inductive practice. If defined between particulars, it does not suffice to determine qualities and can hardly be measured in terms of possession of common characteristics: the “seven strictures” constitute a relentless criticism. Stressed again in a more recent book (see [201]), Goodman’s opinion received many comments and surely made a deep impression in philosophers or psychologists of the end of the XXth century. However, in the beginning of the 2000s, Hahn and Ramscar try to couch categorization in terms of more sophisticated and precise notions of similarity (see [222], and the review of Bradley C. Love [321]). Moreover, if we can easily share some of the critics of Goodman, we support the idea—already present, as we have seen, in Russellian rejection of the doctrine of natural kinds ([436], 461) or in Quine’s criticism of perceptual similarity in favor of a more intellectual way of conceptualizing category membership (see his paper on “natural kinds” in [406] and also the comments of [463])—that similarity must be founded on “good” mathematical structures.
- 10.
- 11.
A “key” is the operator that gives access to some set of documents, according to a particular aspect of the requirement. As the aspects of the requirement may be totally ordered, so that some of them which are, for instance, more general or more basic ones might be processed before others, one can define also a total order over the set of keys.
References
Andrews, G.E.: The Theory of Partitions. Cambridge University Press, Cambridge (1998). New ed. 2006
Apostel, L.: Le problème formel des classifications empiriques. In: La Classification dans les Sciences, pp. 157–230. Duculot, Bruxelles (1963)
Baeza-Yates, R.A.: Fringe analysis revisited. ACM Comput. Surv. 27(1), 111–119 (1993)
Barbut, M.: Mathématiques et sciences humaines. P.U.F., Paris (1969). 2 vols
Barbut, M., Monjardet, B.: Ordre et classification, algèbre et combinatoire, vol. 1. Librairie Hachette, Paris (1970)
Barbut, M., Monjardet, B.: Ordre et classification, algèbre et combinatoire, vol. 2. Librairie Hachette, Paris (1970)
Barthélemy, J.-P., Guénoche, A.: Les arbres et les représentations des proximités. Masson, Paris (1988)
Benzécri, J.-P., et al.: L’Analyse des données, tome 1, taxinomie. Dunod, Paris (1973)
Benzécri, J.-P., et al.: L’Analyse des données, tome 2, correspondances. Dunod, Paris (1973)
Birkhoff, G.: On the structure of abstract algebras. Proc. Camb. Philos. Soc. 31, 433–454 (1935)
Birkhoff, G.: Théorie et application des treillis. Ann. Inst. Henri Poincaré 11(5), 227–240 (1949)
Birkhoff, G.: Lattice Theory, 3rd edn. AMS, Providence (1967)
Boorman, S.A., Olivier, D.C.: Metric on spaces of finite trees. J. Math. Psychol. 10, 26–59 (1973)
Boruvka, O.: Décomposition dans les ensembles et théorie des groupoïdes. Séminaire Dubreil-Pisot, 14e année, 1960/61. Fascicule 2, exposé 22bis (not paged)
Bourbaki, N.: Théorie des Ensembles. Hermann, Paris (1966)
Caspard, N., Leclerc, B., Monjardet, B.: Ensembles ordonnés finis: concepts, résultats et usages. Springer, Berlin (2007)
Dagognet, F.: Le catalogue de la vie. P.U.F., Paris (1970)
Drobisch, M.W.: Neue Darstellung der Logik, nach ihren einfachsten Verhältnissen mit Rücksicht auf Mathematik und Naturwissenschaften. Voss, Leipzig (1968). New edn. Olms, Hildesheim (1968)
Dubreil, P., Jacotin, M.-L.: Théorie algébrique des relations d’équivalence. J. Math. 18, 63–95 (1939)
Duda, R.O., Hart, P.E., Stork, D.H.: Pattern Classification. Wiley-Interscience, New York (2000)
Eisenbarth, B., Ziviani, N., Gonnet, G.H., Melhorn, K., Wood, D.: The theory of fringe analysis and its application to 2–3 trees and B-trees. Inf. Control 55(1), 125–174 (1982)
Fairthorne, R.A.: The patterns of retrieval. Am. Doc. 7, 65–70 (1956)
Fairthorne, R.A.: The mathematics of classification. In: Towards Information Retrieval, pp. 1–10. Butterworths, London (1961)
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)
Frasnay, C.: Notes to the C.R. Acad. Sci. Paris, 1962–1963–1964: a) t. 255, 2878–2879; b) t. 256, 2507–2510; c) t. 257, 1825–1828; d) t. 257, 2944–2947; e) t. 258, 1373–1376; f) t. 259, 3910–3913
Gondran, M.: La structure algébrique des classifications hiérarchiques. Ann. INSEE 22–23, 181–190 (1976)
Gondran, M.: Valeurs propres et vecteurs propres en classification hiérarchique. RAIRO Inform. Théor. 10(3), 39–46 (1976)
Gondran, M.: Graphes, dioïdes et semi-anneaux, nouveaux modèles et algorithmes. Tec et Doc, Paris (2002)
Goodman, N.: Seven strictures on similarity. In: Problems and Projects. The Bobbs-Merril Company, Indianapolis (1972)
Goodman, N., Douglas, M., Hull, D.L.: How Classification Works: Nelson Goodman Among the Social Sciences. Edinburgh University Press, Edinburgh (1992)
Gordon, A.D.: Hierarchical classification. In: Arabie, Ph., Hubert, L.J., de Soete, G. (eds.) Clustering and Classification, pp. 65–121. World Scientific, River Edge (1996)
Greene, D., Knuth, D.: Mathematics for the Analysis of Algorithms, 2d edn. Birkhaüser, Boston (1981)
Gregg, J.: The Language of Taxonomy—An Application of Symbolic Logic to the Study of Classificatory Systems. Columbia University Press, New York (1954)
Hahn, U., Ramscar, M. (eds.) Similarity and Categorization. Oxford University Press, Oxford (2001)
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
Hempel, C.G., Oppenheim, P.: Der Typusbegriff im Lichte der neuen Logik. Sijthoff, Leiden (1936)
Hempel, C.G.: Aspects of Scientific Explanation and Other Essays in the Philosophy of Science. Free Press, New York (1965)
Hillman, D.: Mathematical classification techniques for nonstatic document collections, with particular reference to the problem of relevance. In: Classification Research, Elsinore Conference Proceedings, Munksgaaard, Copenhaguen, pp. 177–209 (1965)
Jambu, M.: Classification automatique pour l’analyse des données, 2 tomes. Bordas-Dunod, Paris (1978)
Jaschek, C., Jaschek, M.: The Classification of Stars. Cambridge University Press, Cambridge (1987). Reprinted with corrections, 1990
Kaufmann, A.: Introduction à la théorie des sous-ensembles flous, tome 3. Application à la classification et à la reconnaissance des formes aux automates et aux choix des critères. Masson, Paris (1975)
Kaufmann, A., Pichat, E.: Méthodes mathématiques non numériques et leurs algorithmes, tome 1, Algorithmes de recherche des éléments maximaux. Masson, Paris (1977)
Knuth, D.E.: The Art of the Computer Programming: Sorting and Searching, vol. 3. Addison-Wesley, Reading (1973)
Krasner, M.: Espaces ultramétriques et ultramatroïdes. Séminaire, Faculté des Sciences de Paris, 1953–1954
Kr̆ivánec, M., Morávec, J.: On NP-hardness in hierarchical clustering. In: Havránek, T., S̆idák, Z., Novák, M. (eds.) COMPSTAT 1984, Proceedings. Physica-Verlag, Heidelberg (1984)
Kr̆ivánec, M.: A note on the computational complexity of hierarchical overlapping clustering. Appl. Math. 30(6), 453–460 (1985)
Lambert, J.: Classer vaut pour retrouver, coder vaut pour inventer. In: Anatomie d’un épistémologue, F. Dagognet. Vrin, Paris (1984)
Lance, G.C., Williams, W.T.: A generalised sorting strategy for computer classification. Nature 212, 218 (1966)
Lance, G.C., Williams, W.T.: A general theory of classification sorting. Comput. J. 9, 373–380 (1967)
Larsen, J.A., Walden, W.E.: Comparing insertion schemes used to update 3-2 trees. Inf. Syst. 4, 127–136 (1979)
Leclerc, B.: Semi-modularité des treillis d’ultramétriques. C.R. Acad. Sci. Paris, A 288, 575–577 (1979)
Leclerc, B.: Description combinatoire des ultramétriques. Math. Sci. Hum. 73, 5–37 (1981)
Leclerc, B.: Arbres minimums communs et compatibilité de données de types variés. Math. Sci. Hum. 98, 41–67 (1987)
Lemin, A.-J.: The category of ultrametric spaces is isomorphic to the category of complete, atomic, tree-like and real graduated lattices Lat*. Algebra Univers. 50(1), 35–49 (2003)
Lerman, I.C.: La classification automatique (1970). Paris
Lerman, I.C.: Classification automatique et analyse ordinale des données. Dunod, Paris (1981)
Love, B.C.: Similarity and categorization, a review. AI Magazine, Summer, 102–105 (2002)
Luszczewska-Romahnowa, S., Batóg, T.: A generalized classification theory I. Stud. Log., tom XVI, 53–70 (1965)
Luszczewska-Romahnowa, S., Batog, T.: A generalized classification theory II. Stud. Log., tom XVII, 7–30 (1965)
Mahmoud, H.: Evolution of Random Search Trees. Wiley, New York (1992)
Mooers, C.N.: From a point of view of mathematical techniques. In: Fairthorne, R.A. (ed.) Towards Information Retrieval. Butterworths, London (1961)
Ore, O.: Theory of equivalence relations. Duke Math. J. 9, 573–627 (1942)
Ore, O.: Some studies on closer relations. Duke Math. J. 10, 761–785 (1943)
Quine, W.V.O.: Ontological Relativity and Other Essays. Columbia University Press, New York (1969)
Rasiowa, H., Sikorski, R.: The Mathematics of Metamathematics, 3rd edn. (1970). Varsovie 1963
Rasiowa, H.: An Algebraic Approach to Non-Classical Logics. North Holland, Amsterdam (1974)
Riordan, J.: Introduction to Combinatorial Analysis. Wiley, New York (1958)
Riordan, J.: Combinatorial Identities. Wiley, New York (1968)
Roux, M.: Algorithmes de Classification. Masson, Paris (1985)
Steven, W., Running, S.W., Loveland, Th.R., Pierce, L.L., Nemani, R.R., Hunt Jr., E.R.: A remote sensing based vegetation classification logic for global land cover analysis. Remote Sens. Environ. 51, 39–48 (1995)
Ruphy, S.: Are stellar kinds natural kinds? A challenging newcomer in the monism/pluralism and realism/antirealism debates. Philos. Sci. 77, 1109–1120 (2010)
Ruspini, E.H.: A new approach to clustering. Inf. Control 15, 33–37 (1969)
Ruspini, E.H.: Numerical method for fuzzy clustering. Inf. Sci. 2, 319–350 (1970)
Russell, B.: Human Knowledge, Its Scopes and Limits. Routledge, London (1992)
Salton, G.: Manipulation of trees in information retrieval. Commun. ACM 5, 103–114 (1962)
Salton, G.: Automatic Information Organization and Retrieval. McGraw-Hill, New York (1968)
Skrasek, J.: Zaklady vyssi matematiky. Nase vosko, Praha (1966)
Smith, L.B.: The concept of same. Adv. Child Development Behav. 24, 216–253 (1993)
Soergel, D.: Mathematical analysis of documentation systems, an attempt to a theory of classification and search request formulation. Inf. Storage Retrieval 3(3), 129–173 (1967)
Tarski, A., Jonsson, B.: Ordinal Algebras. North-Holland, Amsterdam (1956). (Appendix by Bjarni Jonsson)
Van Rijsbergen, C.J.: The Geometry of Information Retrieval. Cambridge University Press, Cambridge (2004)
Vitter, J.S., Flajolet, P.: Average-case analysis of algorithms and data structures. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, vol. A, pp. 431–524. Elsevier/MIT Press, Amsterdam/Cambridge (1990). Chap. 9
Wang, X., Syrmos, V.L.: Optimal cluster selection based on Fisher class separability measure. In: American Control Conference, 2005, vol. 3, pp. 1929–1934 (2005). Proceedings of the 2005 Volume
Woodger, J.H.: Problems arising from the application of mathematical logic to biology. In: Applications Scientifiques de la Logique mathématique, pp. 133–139. Gauthier-Villars, Paris (1954)
Yao, A.C.-C.: On random 3-2 trees. Technical Report UIUDCS-R-74679, Department of Computer Science, Urbana, University of Illinois, Oct. 1974
Yao, A.C.-C.: On random 2-3 trees. Acta Inform. 9(2), 159–170 (1977/78)
Zadeh, L.A.: Similarity relations and fuzzy ordering. E.R.L. Report no M277. Elect. res. Lab. Univ. of Californian Berkeley, July 1970
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer Basel
About this chapter
Cite this chapter
Parrochia, D., Neuville, P. (2013). Empirical Clustering and Classic Hierarchies. In: Towards a General Theory of Classifications. Studies in Universal Logic. Birkhäuser, Basel. https://doi.org/10.1007/978-3-0348-0609-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-0348-0609-1_3
Publisher Name: Birkhäuser, Basel
Print ISBN: 978-3-0348-0608-4
Online ISBN: 978-3-0348-0609-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)