Empirical Clustering and Classic Hierarchies

Parrochia, Daniel; Neuville, Pierre

doi:10.1007/978-3-0348-0609-1_3

Daniel Parrochia³ &
Pierre Neuville⁴

Part of the book series: Studies in Universal Logic ((SUL))

1358 Accesses

Abstract

As most of the classifications man has constructed during a long period going from Plato and Aristotle to Linnaeus (and even to the XIXth century) were hierarchical classifications, we devote this complete chapter to the exposition of mathematical models of hierarchies. After some historical views on the subject (Sect. 3.2), we introduce (Sect. 3.3) the basic notions of partition, partition lattice, and chain of partitions, this last one being the exact mathematical model of a hierarchical classification, classically represented by a tree diagram. In Sect. 3.4, we give the structure of the set of all hierarchical classifications on a finite set (which allows us to know exhaustively all the possible hierarchies we can make). Then, in Sect. 3.5, we present the exact correspondence between tree diagram, hierarchical structure, and the distance we can define on it, which is an ultrametric. These ideal models and their algebraic representations (Sect. 3.6) are those that the taxonomists want generally to obtain, but the appearance of the real world, in general, is quite far from such nice orders. So, we explain (Sect. 3.7) how we can replace the empirical quasi-chaotic data with due mathematical taxonomies. Though, in many cases, we are able to do that and can easily adapt our numerical models to empirical data, some problems arise in this operation (Sect. 3.8): either because of the existence of mathematical limits within the models (intrinsic instability) or because of the presence of changes in human perception of the world in the course of time (extrinsic instability). However (Sect. 3.9), we list finally some possible answers to these important questions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Finding the number of partitions of a set is different from finding the number of partitions of a number n, but it is connected to it (see [5, 307]) and our Sect. 3.4.2.
2.
Prof. Stéphanie Ruphy, a former student of D. Parrochia in Toulouse, has shown that stellar kinds will not please the essentialist monism, because of the existence of continuous parameters, transitory properties, and several kinds to which a given star may belong, so that stellar kinds cannot be said “natural kinds”. See [432], 1109–1120: “Not only does the stellar world not come prepackaged with a unique set of objective, privileged (in an essentialist sense) divisions, but also it does not come prepackaged with objective divisions, tout court” (1118). But, on the opposite, stellar kinds will not please the pluralist embracing promiscuous realism. There exist in fact objective properties connected to the main aims of scientific research in Astrophysics. “Astrophysicists want to know how stars form, evolve, and disappear. Their theoretical understanding of the behavior of gaseous spheres tells them that parameters such as temperature, density, or mass loss are determinant parameters in stellar evolutionary processes, whereas proper motion or distance from the earth are not; hence, we have their choice of the former, and not the latter, as taxonomic parameters. In short, kind membership is conferred by structural properties central for explaining a large variety of stellar behaviors (and those properties translate into spectral features that are directly observable)” (1111).
3.
On the contrary, in the case of a similarity coefficient s, the objects e and e′ would have to be as much more similar as s(e,e′) gets a bigger value. On the definition of a dissimilarity coefficient, see Sect. 3.5.1.
4.
We resume here the presentation of Boorman and Olivier [58]. For another presentation, see B. Leclerc [299].
5.
The same type of generalization has been presented in the form of fuzzy equivalences in the case of certain similarities and ultrasimilarities proposed by some other authors (see, for instance, [433, 434]).
6.
In a strict sense, a similarity relation S is a reflexive, symmetric, but not transitive relation.
7.
In fact, this assumption of empirical clustering may be questioned. In particular, Russell (see [436]) has shown that Keynes’ principle of limited variety (Treatise on probability, Chap. XXII, 258) was not valid, so that classification constraints cannot be explained by empirical observations but are to be founded on mathematical regularities.
8.
The question raised with the Russell’s paradox leads, beyond the problematic answer Russell himself has proposed with the theory of types, to the beginning of theories of universes with classes and sets. After the classical solution of Zermelo and Fraenkel (the ZF axiomatic), will come non-classical theories like those of Finsler, Kelley-Morse, Quine, Von Neumann, etc. Some interesting attempts of the French mathematician Claude Frasnay about new definitions of classes and sets must also be mentioned here (see [180]).
9.
Plato (Protagoras 331de), already said that “it is not fair to describe things as like which have some point alike, however small, or as unlike that have some point unlike”. Nelson Goodman goes further in the beginning of a well-known paper: “Similarity, I submit, is insidious. And if the association here with invidious comparison is itself invidious, so much the better. Similarity, ever ready to solve philosophical problems and overcome obstacles, is a pretender, an impostor, a quack. It has, indeed, its place and its uses, but is more often found where it does not belong, professing powers it does not possess.” (See [200], 437.) Of course, resemblance alone is not enough for representation, it may be superfluous in the case of descriptions of replicas of inscriptions or events, it does not explain metaphors and does not account for our predictive, or more generally, our inductive practice. If defined between particulars, it does not suffice to determine qualities and can hardly be measured in terms of possession of common characteristics: the “seven strictures” constitute a relentless criticism. Stressed again in a more recent book (see [201]), Goodman’s opinion received many comments and surely made a deep impression in philosophers or psychologists of the end of the XXth century. However, in the beginning of the 2000s, Hahn and Ramscar try to couch categorization in terms of more sophisticated and precise notions of similarity (see [222], and the review of Bradley C. Love [321]). Moreover, if we can easily share some of the critics of Goodman, we support the idea—already present, as we have seen, in Russellian rejection of the doctrine of natural kinds ([436], 461) or in Quine’s criticism of perceptual similarity in favor of a more intellectual way of conceptualizing category membership (see his paper on “natural kinds” in [406] and also the comments of [463])—that similarity must be founded on “good” mathematical structures.
10.
In fact it is well known, since the middle of the 1980s, that such a problem is, in the case or hierarchical clustering, NP-hard (see [284]), and, in the case of overlapping clustering, NP-complete (see [285]).
11.
A “key” is the operator that gives access to some set of documents, according to a particular aspect of the requirement. As the aspects of the requirement may be totally ordered, so that some of them which are, for instance, more general or more basic ones might be processed before others, one can define also a total order over the set of keys.

References

Andrews, G.E.: The Theory of Partitions. Cambridge University Press, Cambridge (1998). New ed. 2006
MATH Google Scholar
Apostel, L.: Le problème formel des classifications empiriques. In: La Classification dans les Sciences, pp. 157–230. Duculot, Bruxelles (1963)
Google Scholar
Baeza-Yates, R.A.: Fringe analysis revisited. ACM Comput. Surv. 27(1), 111–119 (1993)
Google Scholar
Barbut, M.: Mathématiques et sciences humaines. P.U.F., Paris (1969). 2 vols
Google Scholar
Barbut, M., Monjardet, B.: Ordre et classification, algèbre et combinatoire, vol. 1. Librairie Hachette, Paris (1970)
Google Scholar
Barbut, M., Monjardet, B.: Ordre et classification, algèbre et combinatoire, vol. 2. Librairie Hachette, Paris (1970)
Google Scholar
Barthélemy, J.-P., Guénoche, A.: Les arbres et les représentations des proximités. Masson, Paris (1988)
Google Scholar
Benzécri, J.-P., et al.: L’Analyse des données, tome 1, taxinomie. Dunod, Paris (1973)
Google Scholar
Benzécri, J.-P., et al.: L’Analyse des données, tome 2, correspondances. Dunod, Paris (1973)
Google Scholar
Birkhoff, G.: On the structure of abstract algebras. Proc. Camb. Philos. Soc. 31, 433–454 (1935)
Article Google Scholar
Birkhoff, G.: Théorie et application des treillis. Ann. Inst. Henri Poincaré 11(5), 227–240 (1949)
MathSciNet MATH Google Scholar
Birkhoff, G.: Lattice Theory, 3rd edn. AMS, Providence (1967)
MATH Google Scholar
Boorman, S.A., Olivier, D.C.: Metric on spaces of finite trees. J. Math. Psychol. 10, 26–59 (1973)
Article MathSciNet MATH Google Scholar
Boruvka, O.: Décomposition dans les ensembles et théorie des groupoïdes. Séminaire Dubreil-Pisot, 14e année, 1960/61. Fascicule 2, exposé 22bis (not paged)
Google Scholar
Bourbaki, N.: Théorie des Ensembles. Hermann, Paris (1966)
Google Scholar
Caspard, N., Leclerc, B., Monjardet, B.: Ensembles ordonnés finis: concepts, résultats et usages. Springer, Berlin (2007)
MATH Google Scholar
Dagognet, F.: Le catalogue de la vie. P.U.F., Paris (1970)
Google Scholar
Drobisch, M.W.: Neue Darstellung der Logik, nach ihren einfachsten Verhältnissen mit Rücksicht auf Mathematik und Naturwissenschaften. Voss, Leipzig (1968). New edn. Olms, Hildesheim (1968)
Google Scholar
Dubreil, P., Jacotin, M.-L.: Théorie algébrique des relations d’équivalence. J. Math. 18, 63–95 (1939)
MATH Google Scholar
Duda, R.O., Hart, P.E., Stork, D.H.: Pattern Classification. Wiley-Interscience, New York (2000)
Google Scholar
Eisenbarth, B., Ziviani, N., Gonnet, G.H., Melhorn, K., Wood, D.: The theory of fringe analysis and its application to 2–3 trees and B-trees. Inf. Control 55(1), 125–174 (1982)
Article MATH Google Scholar
Fairthorne, R.A.: The patterns of retrieval. Am. Doc. 7, 65–70 (1956)
Article Google Scholar
Fairthorne, R.A.: The mathematics of classification. In: Towards Information Retrieval, pp. 1–10. Butterworths, London (1961)
Google Scholar
Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)
Article Google Scholar
Frasnay, C.: Notes to the C.R. Acad. Sci. Paris, 1962–1963–1964: a) t. 255, 2878–2879; b) t. 256, 2507–2510; c) t. 257, 1825–1828; d) t. 257, 2944–2947; e) t. 258, 1373–1376; f) t. 259, 3910–3913
Google Scholar
Gondran, M.: La structure algébrique des classifications hiérarchiques. Ann. INSEE 22–23, 181–190 (1976)
MathSciNet Google Scholar
Gondran, M.: Valeurs propres et vecteurs propres en classification hiérarchique. RAIRO Inform. Théor. 10(3), 39–46 (1976)
MathSciNet Google Scholar
Gondran, M.: Graphes, dioïdes et semi-anneaux, nouveaux modèles et algorithmes. Tec et Doc, Paris (2002)
MATH Google Scholar
Goodman, N.: Seven strictures on similarity. In: Problems and Projects. The Bobbs-Merril Company, Indianapolis (1972)
Google Scholar
Goodman, N., Douglas, M., Hull, D.L.: How Classification Works: Nelson Goodman Among the Social Sciences. Edinburgh University Press, Edinburgh (1992)
Google Scholar
Gordon, A.D.: Hierarchical classification. In: Arabie, Ph., Hubert, L.J., de Soete, G. (eds.) Clustering and Classification, pp. 65–121. World Scientific, River Edge (1996)
Chapter Google Scholar
Greene, D., Knuth, D.: Mathematics for the Analysis of Algorithms, 2d edn. Birkhaüser, Boston (1981)
MATH Google Scholar
Gregg, J.: The Language of Taxonomy—An Application of Symbolic Logic to the Study of Classificatory Systems. Columbia University Press, New York (1954)
Google Scholar
Hahn, U., Ramscar, M. (eds.) Similarity and Categorization. Oxford University Press, Oxford (2001)
Google Scholar
Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)
MATH Google Scholar
Hempel, C.G., Oppenheim, P.: Der Typusbegriff im Lichte der neuen Logik. Sijthoff, Leiden (1936)
Google Scholar
Hempel, C.G.: Aspects of Scientific Explanation and Other Essays in the Philosophy of Science. Free Press, New York (1965)
Google Scholar
Hillman, D.: Mathematical classification techniques for nonstatic document collections, with particular reference to the problem of relevance. In: Classification Research, Elsinore Conference Proceedings, Munksgaaard, Copenhaguen, pp. 177–209 (1965)
Google Scholar
Jambu, M.: Classification automatique pour l’analyse des données, 2 tomes. Bordas-Dunod, Paris (1978)
Google Scholar
Jaschek, C., Jaschek, M.: The Classification of Stars. Cambridge University Press, Cambridge (1987). Reprinted with corrections, 1990
Book Google Scholar
Kaufmann, A.: Introduction à la théorie des sous-ensembles flous, tome 3. Application à la classification et à la reconnaissance des formes aux automates et aux choix des critères. Masson, Paris (1975)
Google Scholar
Kaufmann, A., Pichat, E.: Méthodes mathématiques non numériques et leurs algorithmes, tome 1, Algorithmes de recherche des éléments maximaux. Masson, Paris (1977)
Google Scholar
Knuth, D.E.: The Art of the Computer Programming: Sorting and Searching, vol. 3. Addison-Wesley, Reading (1973)
Google Scholar
Krasner, M.: Espaces ultramétriques et ultramatroïdes. Séminaire, Faculté des Sciences de Paris, 1953–1954
Google Scholar
Kr̆ivánec, M., Morávec, J.: On NP-hardness in hierarchical clustering. In: Havránek, T., S̆idák, Z., Novák, M. (eds.) COMPSTAT 1984, Proceedings. Physica-Verlag, Heidelberg (1984)
Google Scholar
Kr̆ivánec, M.: A note on the computational complexity of hierarchical overlapping clustering. Appl. Math. 30(6), 453–460 (1985)
Google Scholar
Lambert, J.: Classer vaut pour retrouver, coder vaut pour inventer. In: Anatomie d’un épistémologue, F. Dagognet. Vrin, Paris (1984)
Google Scholar
Lance, G.C., Williams, W.T.: A generalised sorting strategy for computer classification. Nature 212, 218 (1966)
Article Google Scholar
Lance, G.C., Williams, W.T.: A general theory of classification sorting. Comput. J. 9, 373–380 (1967)
Article Google Scholar
Larsen, J.A., Walden, W.E.: Comparing insertion schemes used to update 3-2 trees. Inf. Syst. 4, 127–136 (1979)
Article Google Scholar
Leclerc, B.: Semi-modularité des treillis d’ultramétriques. C.R. Acad. Sci. Paris, A 288, 575–577 (1979)
MathSciNet MATH Google Scholar
Leclerc, B.: Description combinatoire des ultramétriques. Math. Sci. Hum. 73, 5–37 (1981)
MathSciNet Google Scholar
Leclerc, B.: Arbres minimums communs et compatibilité de données de types variés. Math. Sci. Hum. 98, 41–67 (1987)
MathSciNet Google Scholar
Lemin, A.-J.: The category of ultrametric spaces is isomorphic to the category of complete, atomic, tree-like and real graduated lattices Lat*. Algebra Univers. 50(1), 35–49 (2003)
Article MathSciNet MATH Google Scholar
Lerman, I.C.: La classification automatique (1970). Paris
MATH Google Scholar
Lerman, I.C.: Classification automatique et analyse ordinale des données. Dunod, Paris (1981)
Google Scholar
Love, B.C.: Similarity and categorization, a review. AI Magazine, Summer, 102–105 (2002)
Google Scholar
Luszczewska-Romahnowa, S., Batóg, T.: A generalized classification theory I. Stud. Log., tom XVI, 53–70 (1965)
Article Google Scholar
Luszczewska-Romahnowa, S., Batog, T.: A generalized classification theory II. Stud. Log., tom XVII, 7–30 (1965)
Article MathSciNet Google Scholar
Mahmoud, H.: Evolution of Random Search Trees. Wiley, New York (1992)
MATH Google Scholar
Mooers, C.N.: From a point of view of mathematical techniques. In: Fairthorne, R.A. (ed.) Towards Information Retrieval. Butterworths, London (1961)
Google Scholar
Ore, O.: Theory of equivalence relations. Duke Math. J. 9, 573–627 (1942)
Article MathSciNet MATH Google Scholar
Ore, O.: Some studies on closer relations. Duke Math. J. 10, 761–785 (1943)
Article MathSciNet MATH Google Scholar
Quine, W.V.O.: Ontological Relativity and Other Essays. Columbia University Press, New York (1969)
Google Scholar
Rasiowa, H., Sikorski, R.: The Mathematics of Metamathematics, 3rd edn. (1970). Varsovie 1963
Google Scholar
Rasiowa, H.: An Algebraic Approach to Non-Classical Logics. North Holland, Amsterdam (1974)
MATH Google Scholar
Riordan, J.: Introduction to Combinatorial Analysis. Wiley, New York (1958)
MATH Google Scholar
Riordan, J.: Combinatorial Identities. Wiley, New York (1968)
MATH Google Scholar
Roux, M.: Algorithmes de Classification. Masson, Paris (1985)
Google Scholar
Steven, W., Running, S.W., Loveland, Th.R., Pierce, L.L., Nemani, R.R., Hunt Jr., E.R.: A remote sensing based vegetation classification logic for global land cover analysis. Remote Sens. Environ. 51, 39–48 (1995)
Article Google Scholar
Ruphy, S.: Are stellar kinds natural kinds? A challenging newcomer in the monism/pluralism and realism/antirealism debates. Philos. Sci. 77, 1109–1120 (2010)
Article Google Scholar
Ruspini, E.H.: A new approach to clustering. Inf. Control 15, 33–37 (1969)
Article Google Scholar
Ruspini, E.H.: Numerical method for fuzzy clustering. Inf. Sci. 2, 319–350 (1970)
Article MATH Google Scholar
Russell, B.: Human Knowledge, Its Scopes and Limits. Routledge, London (1992)
Google Scholar
Salton, G.: Manipulation of trees in information retrieval. Commun. ACM 5, 103–114 (1962)
Article MATH Google Scholar
Salton, G.: Automatic Information Organization and Retrieval. McGraw-Hill, New York (1968)
Google Scholar
Skrasek, J.: Zaklady vyssi matematiky. Nase vosko, Praha (1966)
Google Scholar
Smith, L.B.: The concept of same. Adv. Child Development Behav. 24, 216–253 (1993)
Google Scholar
Soergel, D.: Mathematical analysis of documentation systems, an attempt to a theory of classification and search request formulation. Inf. Storage Retrieval 3(3), 129–173 (1967)
Article Google Scholar
Tarski, A., Jonsson, B.: Ordinal Algebras. North-Holland, Amsterdam (1956). (Appendix by Bjarni Jonsson)
MATH Google Scholar
Van Rijsbergen, C.J.: The Geometry of Information Retrieval. Cambridge University Press, Cambridge (2004)
Book MATH Google Scholar
Vitter, J.S., Flajolet, P.: Average-case analysis of algorithms and data structures. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, vol. A, pp. 431–524. Elsevier/MIT Press, Amsterdam/Cambridge (1990). Chap. 9
Google Scholar
Wang, X., Syrmos, V.L.: Optimal cluster selection based on Fisher class separability measure. In: American Control Conference, 2005, vol. 3, pp. 1929–1934 (2005). Proceedings of the 2005 Volume
Google Scholar
Woodger, J.H.: Problems arising from the application of mathematical logic to biology. In: Applications Scientifiques de la Logique mathématique, pp. 133–139. Gauthier-Villars, Paris (1954)
Google Scholar
Yao, A.C.-C.: On random 3-2 trees. Technical Report UIUDCS-R-74679, Department of Computer Science, Urbana, University of Illinois, Oct. 1974
Google Scholar
Yao, A.C.-C.: On random 2-3 trees. Acta Inform. 9(2), 159–170 (1977/78)
Article Google Scholar
Zadeh, L.A.: Similarity relations and fuzzy ordering. E.R.L. Report no M277. Elect. res. Lab. Univ. of Californian Berkeley, July 1970
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Philosophy, Université Jean Moulin – Lyon III, Lyon, France
Daniel Parrochia
Ecole Nationale Supérieure des Sciences de l’Information et de la Bibliothèque, Villeurbanne, France
Pierre Neuville

Authors

Daniel Parrochia
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Neuville
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Parrochia, D., Neuville, P. (2013). Empirical Clustering and Classic Hierarchies. In: Towards a General Theory of Classifications. Studies in Universal Logic. Birkhäuser, Basel. https://doi.org/10.1007/978-3-0348-0609-1_3

Download citation

DOI: https://doi.org/10.1007/978-3-0348-0609-1_3
Publisher Name: Birkhäuser, Basel
Print ISBN: 978-3-0348-0608-4
Online ISBN: 978-3-0348-0609-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics