Skip to main content

Empirical Clustering and Classic Hierarchies

  • Chapter
Towards a General Theory of Classifications

Part of the book series: Studies in Universal Logic ((SUL))

  • 1358 Accesses

Abstract

As most of the classifications man has constructed during a long period going from Plato and Aristotle to Linnaeus (and even to the XIXth century) were hierarchical classifications, we devote this complete chapter to the exposition of mathematical models of hierarchies. After some historical views on the subject (Sect. 3.2), we introduce (Sect. 3.3) the basic notions of partition, partition lattice, and chain of partitions, this last one being the exact mathematical model of a hierarchical classification, classically represented by a tree diagram. In Sect. 3.4, we give the structure of the set of all hierarchical classifications on a finite set (which allows us to know exhaustively all the possible hierarchies we can make). Then, in Sect. 3.5, we present the exact correspondence between tree diagram, hierarchical structure, and the distance we can define on it, which is an ultrametric. These ideal models and their algebraic representations (Sect. 3.6) are those that the taxonomists want generally to obtain, but the appearance of the real world, in general, is quite far from such nice orders. So, we explain (Sect. 3.7) how we can replace the empirical quasi-chaotic data with due mathematical taxonomies. Though, in many cases, we are able to do that and can easily adapt our numerical models to empirical data, some problems arise in this operation (Sect. 3.8): either because of the existence of mathematical limits within the models (intrinsic instability) or because of the presence of changes in human perception of the world in the course of time (extrinsic instability). However (Sect. 3.9), we list finally some possible answers to these important questions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Finding the number of partitions of a set is different from finding the number of partitions of a number n, but it is connected to it (see [5, 307]) and our Sect. 3.4.2.

  2. 2.

    Prof. Stéphanie Ruphy, a former student of D. Parrochia in Toulouse, has shown that stellar kinds will not please the essentialist monism, because of the existence of continuous parameters, transitory properties, and several kinds to which a given star may belong, so that stellar kinds cannot be said “natural kinds”. See [432], 1109–1120: “Not only does the stellar world not come prepackaged with a unique set of objective, privileged (in an essentialist sense) divisions, but also it does not come prepackaged with objective divisions, tout court” (1118). But, on the opposite, stellar kinds will not please the pluralist embracing promiscuous realism. There exist in fact objective properties connected to the main aims of scientific research in Astrophysics. “Astrophysicists want to know how stars form, evolve, and disappear. Their theoretical understanding of the behavior of gaseous spheres tells them that parameters such as temperature, density, or mass loss are determinant parameters in stellar evolutionary processes, whereas proper motion or distance from the earth are not; hence, we have their choice of the former, and not the latter, as taxonomic parameters. In short, kind membership is conferred by structural properties central for explaining a large variety of stellar behaviors (and those properties translate into spectral features that are directly observable)” (1111).

  3. 3.

    On the contrary, in the case of a similarity coefficient s, the objects e and e′ would have to be as much more similar as s(e,e′) gets a bigger value. On the definition of a dissimilarity coefficient, see Sect. 3.5.1.

  4. 4.

    We resume here the presentation of Boorman and Olivier [58]. For another presentation, see B. Leclerc [299].

  5. 5.

    The same type of generalization has been presented in the form of fuzzy equivalences in the case of certain similarities and ultrasimilarities proposed by some other authors (see, for instance, [433, 434]).

  6. 6.

    In a strict sense, a similarity relation S is a reflexive, symmetric, but not transitive relation.

  7. 7.

    In fact, this assumption of empirical clustering may be questioned. In particular, Russell (see [436]) has shown that Keynes’ principle of limited variety (Treatise on probability, Chap. XXII, 258) was not valid, so that classification constraints cannot be explained by empirical observations but are to be founded on mathematical regularities.

  8. 8.

    The question raised with the Russell’s paradox leads, beyond the problematic answer Russell himself has proposed with the theory of types, to the beginning of theories of universes with classes and sets. After the classical solution of Zermelo and Fraenkel (the ZF axiomatic), will come non-classical theories like those of Finsler, Kelley-Morse, Quine, Von Neumann, etc. Some interesting attempts of the French mathematician Claude Frasnay about new definitions of classes and sets must also be mentioned here (see [180]).

  9. 9.

    Plato (Protagoras 331de), already said that “it is not fair to describe things as like which have some point alike, however small, or as unlike that have some point unlike”. Nelson Goodman goes further in the beginning of a well-known paper: “Similarity, I submit, is insidious. And if the association here with invidious comparison is itself invidious, so much the better. Similarity, ever ready to solve philosophical problems and overcome obstacles, is a pretender, an impostor, a quack. It has, indeed, its place and its uses, but is more often found where it does not belong, professing powers it does not possess.” (See [200], 437.) Of course, resemblance alone is not enough for representation, it may be superfluous in the case of descriptions of replicas of inscriptions or events, it does not explain metaphors and does not account for our predictive, or more generally, our inductive practice. If defined between particulars, it does not suffice to determine qualities and can hardly be measured in terms of possession of common characteristics: the “seven strictures” constitute a relentless criticism. Stressed again in a more recent book (see [201]), Goodman’s opinion received many comments and surely made a deep impression in philosophers or psychologists of the end of the XXth century. However, in the beginning of the 2000s, Hahn and Ramscar try to couch categorization in terms of more sophisticated and precise notions of similarity (see [222], and the review of Bradley C. Love [321]). Moreover, if we can easily share some of the critics of Goodman, we support the idea—already present, as we have seen, in Russellian rejection of the doctrine of natural kinds ([436], 461) or in Quine’s criticism of perceptual similarity in favor of a more intellectual way of conceptualizing category membership (see his paper on “natural kinds” in [406] and also the comments of [463])—that similarity must be founded on “good” mathematical structures.

  10. 10.

    In fact it is well known, since the middle of the 1980s, that such a problem is, in the case or hierarchical clustering, NP-hard (see [284]), and, in the case of overlapping clustering, NP-complete (see [285]).

  11. 11.

    A “key” is the operator that gives access to some set of documents, according to a particular aspect of the requirement. As the aspects of the requirement may be totally ordered, so that some of them which are, for instance, more general or more basic ones might be processed before others, one can define also a total order over the set of keys.

References

  1. Andrews, G.E.: The Theory of Partitions. Cambridge University Press, Cambridge (1998). New ed. 2006

    MATH  Google Scholar 

  2. Apostel, L.: Le problème formel des classifications empiriques. In: La Classification dans les Sciences, pp. 157–230. Duculot, Bruxelles (1963)

    Google Scholar 

  3. Baeza-Yates, R.A.: Fringe analysis revisited. ACM Comput. Surv. 27(1), 111–119 (1993)

    Google Scholar 

  4. Barbut, M.: Mathématiques et sciences humaines. P.U.F., Paris (1969). 2 vols

    Google Scholar 

  5. Barbut, M., Monjardet, B.: Ordre et classification, algèbre et combinatoire, vol. 1. Librairie Hachette, Paris (1970)

    Google Scholar 

  6. Barbut, M., Monjardet, B.: Ordre et classification, algèbre et combinatoire, vol. 2. Librairie Hachette, Paris (1970)

    Google Scholar 

  7. Barthélemy, J.-P., Guénoche, A.: Les arbres et les représentations des proximités. Masson, Paris (1988)

    Google Scholar 

  8. Benzécri, J.-P., et al.: L’Analyse des données, tome 1, taxinomie. Dunod, Paris (1973)

    Google Scholar 

  9. Benzécri, J.-P., et al.: L’Analyse des données, tome 2, correspondances. Dunod, Paris (1973)

    Google Scholar 

  10. Birkhoff, G.: On the structure of abstract algebras. Proc. Camb. Philos. Soc. 31, 433–454 (1935)

    Article  Google Scholar 

  11. Birkhoff, G.: Théorie et application des treillis. Ann. Inst. Henri Poincaré 11(5), 227–240 (1949)

    MathSciNet  MATH  Google Scholar 

  12. Birkhoff, G.: Lattice Theory, 3rd edn. AMS, Providence (1967)

    MATH  Google Scholar 

  13. Boorman, S.A., Olivier, D.C.: Metric on spaces of finite trees. J. Math. Psychol. 10, 26–59 (1973)

    Article  MathSciNet  MATH  Google Scholar 

  14. Boruvka, O.: Décomposition dans les ensembles et théorie des groupoïdes. Séminaire Dubreil-Pisot, 14e année, 1960/61. Fascicule 2, exposé 22bis (not paged)

    Google Scholar 

  15. Bourbaki, N.: Théorie des Ensembles. Hermann, Paris (1966)

    Google Scholar 

  16. Caspard, N., Leclerc, B., Monjardet, B.: Ensembles ordonnés finis: concepts, résultats et usages. Springer, Berlin (2007)

    MATH  Google Scholar 

  17. Dagognet, F.: Le catalogue de la vie. P.U.F., Paris (1970)

    Google Scholar 

  18. Drobisch, M.W.: Neue Darstellung der Logik, nach ihren einfachsten Verhältnissen mit Rücksicht auf Mathematik und Naturwissenschaften. Voss, Leipzig (1968). New edn. Olms, Hildesheim (1968)

    Google Scholar 

  19. Dubreil, P., Jacotin, M.-L.: Théorie algébrique des relations d’équivalence. J. Math. 18, 63–95 (1939)

    MATH  Google Scholar 

  20. Duda, R.O., Hart, P.E., Stork, D.H.: Pattern Classification. Wiley-Interscience, New York (2000)

    Google Scholar 

  21. Eisenbarth, B., Ziviani, N., Gonnet, G.H., Melhorn, K., Wood, D.: The theory of fringe analysis and its application to 2–3 trees and B-trees. Inf. Control 55(1), 125–174 (1982)

    Article  MATH  Google Scholar 

  22. Fairthorne, R.A.: The patterns of retrieval. Am. Doc. 7, 65–70 (1956)

    Article  Google Scholar 

  23. Fairthorne, R.A.: The mathematics of classification. In: Towards Information Retrieval, pp. 1–10. Butterworths, London (1961)

    Google Scholar 

  24. Fisher, R.A.: The use of multiple measurements in taxonomic problems. Ann. Eugenics 7, 179–188 (1936)

    Article  Google Scholar 

  25. Frasnay, C.: Notes to the C.R. Acad. Sci. Paris, 1962–1963–1964: a) t. 255, 2878–2879; b) t. 256, 2507–2510; c) t. 257, 1825–1828; d) t. 257, 2944–2947; e) t. 258, 1373–1376; f) t. 259, 3910–3913

    Google Scholar 

  26. Gondran, M.: La structure algébrique des classifications hiérarchiques. Ann. INSEE 22–23, 181–190 (1976)

    MathSciNet  Google Scholar 

  27. Gondran, M.: Valeurs propres et vecteurs propres en classification hiérarchique. RAIRO Inform. Théor. 10(3), 39–46 (1976)

    MathSciNet  Google Scholar 

  28. Gondran, M.: Graphes, dioïdes et semi-anneaux, nouveaux modèles et algorithmes. Tec et Doc, Paris (2002)

    MATH  Google Scholar 

  29. Goodman, N.: Seven strictures on similarity. In: Problems and Projects. The Bobbs-Merril Company, Indianapolis (1972)

    Google Scholar 

  30. Goodman, N., Douglas, M., Hull, D.L.: How Classification Works: Nelson Goodman Among the Social Sciences. Edinburgh University Press, Edinburgh (1992)

    Google Scholar 

  31. Gordon, A.D.: Hierarchical classification. In: Arabie, Ph., Hubert, L.J., de Soete, G. (eds.) Clustering and Classification, pp. 65–121. World Scientific, River Edge (1996)

    Chapter  Google Scholar 

  32. Greene, D., Knuth, D.: Mathematics for the Analysis of Algorithms, 2d edn. Birkhaüser, Boston (1981)

    MATH  Google Scholar 

  33. Gregg, J.: The Language of Taxonomy—An Application of Symbolic Logic to the Study of Classificatory Systems. Columbia University Press, New York (1954)

    Google Scholar 

  34. Hahn, U., Ramscar, M. (eds.) Similarity and Categorization. Oxford University Press, Oxford (2001)

    Google Scholar 

  35. Hartigan, J.A.: Clustering Algorithms. Wiley, New York (1975)

    MATH  Google Scholar 

  36. Hempel, C.G., Oppenheim, P.: Der Typusbegriff im Lichte der neuen Logik. Sijthoff, Leiden (1936)

    Google Scholar 

  37. Hempel, C.G.: Aspects of Scientific Explanation and Other Essays in the Philosophy of Science. Free Press, New York (1965)

    Google Scholar 

  38. Hillman, D.: Mathematical classification techniques for nonstatic document collections, with particular reference to the problem of relevance. In: Classification Research, Elsinore Conference Proceedings, Munksgaaard, Copenhaguen, pp. 177–209 (1965)

    Google Scholar 

  39. Jambu, M.: Classification automatique pour l’analyse des données, 2 tomes. Bordas-Dunod, Paris (1978)

    Google Scholar 

  40. Jaschek, C., Jaschek, M.: The Classification of Stars. Cambridge University Press, Cambridge (1987). Reprinted with corrections, 1990

    Book  Google Scholar 

  41. Kaufmann, A.: Introduction à la théorie des sous-ensembles flous, tome 3. Application à la classification et à la reconnaissance des formes aux automates et aux choix des critères. Masson, Paris (1975)

    Google Scholar 

  42. Kaufmann, A., Pichat, E.: Méthodes mathématiques non numériques et leurs algorithmes, tome 1, Algorithmes de recherche des éléments maximaux. Masson, Paris (1977)

    Google Scholar 

  43. Knuth, D.E.: The Art of the Computer Programming: Sorting and Searching, vol. 3. Addison-Wesley, Reading (1973)

    Google Scholar 

  44. Krasner, M.: Espaces ultramétriques et ultramatroïdes. Séminaire, Faculté des Sciences de Paris, 1953–1954

    Google Scholar 

  45. Kr̆ivánec, M., Morávec, J.: On NP-hardness in hierarchical clustering. In: Havránek, T., S̆idák, Z., Novák, M. (eds.) COMPSTAT 1984, Proceedings. Physica-Verlag, Heidelberg (1984)

    Google Scholar 

  46. Kr̆ivánec, M.: A note on the computational complexity of hierarchical overlapping clustering. Appl. Math. 30(6), 453–460 (1985)

    Google Scholar 

  47. Lambert, J.: Classer vaut pour retrouver, coder vaut pour inventer. In: Anatomie d’un épistémologue, F. Dagognet. Vrin, Paris (1984)

    Google Scholar 

  48. Lance, G.C., Williams, W.T.: A generalised sorting strategy for computer classification. Nature 212, 218 (1966)

    Article  Google Scholar 

  49. Lance, G.C., Williams, W.T.: A general theory of classification sorting. Comput. J. 9, 373–380 (1967)

    Article  Google Scholar 

  50. Larsen, J.A., Walden, W.E.: Comparing insertion schemes used to update 3-2 trees. Inf. Syst. 4, 127–136 (1979)

    Article  Google Scholar 

  51. Leclerc, B.: Semi-modularité des treillis d’ultramétriques. C.R. Acad. Sci. Paris, A 288, 575–577 (1979)

    MathSciNet  MATH  Google Scholar 

  52. Leclerc, B.: Description combinatoire des ultramétriques. Math. Sci. Hum. 73, 5–37 (1981)

    MathSciNet  Google Scholar 

  53. Leclerc, B.: Arbres minimums communs et compatibilité de données de types variés. Math. Sci. Hum. 98, 41–67 (1987)

    MathSciNet  Google Scholar 

  54. Lemin, A.-J.: The category of ultrametric spaces is isomorphic to the category of complete, atomic, tree-like and real graduated lattices Lat*. Algebra Univers. 50(1), 35–49 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  55. Lerman, I.C.: La classification automatique (1970). Paris

    MATH  Google Scholar 

  56. Lerman, I.C.: Classification automatique et analyse ordinale des données. Dunod, Paris (1981)

    Google Scholar 

  57. Love, B.C.: Similarity and categorization, a review. AI Magazine, Summer, 102–105 (2002)

    Google Scholar 

  58. Luszczewska-Romahnowa, S., Batóg, T.: A generalized classification theory I. Stud. Log., tom XVI, 53–70 (1965)

    Article  Google Scholar 

  59. Luszczewska-Romahnowa, S., Batog, T.: A generalized classification theory II. Stud. Log., tom XVII, 7–30 (1965)

    Article  MathSciNet  Google Scholar 

  60. Mahmoud, H.: Evolution of Random Search Trees. Wiley, New York (1992)

    MATH  Google Scholar 

  61. Mooers, C.N.: From a point of view of mathematical techniques. In: Fairthorne, R.A. (ed.) Towards Information Retrieval. Butterworths, London (1961)

    Google Scholar 

  62. Ore, O.: Theory of equivalence relations. Duke Math. J. 9, 573–627 (1942)

    Article  MathSciNet  MATH  Google Scholar 

  63. Ore, O.: Some studies on closer relations. Duke Math. J. 10, 761–785 (1943)

    Article  MathSciNet  MATH  Google Scholar 

  64. Quine, W.V.O.: Ontological Relativity and Other Essays. Columbia University Press, New York (1969)

    Google Scholar 

  65. Rasiowa, H., Sikorski, R.: The Mathematics of Metamathematics, 3rd edn. (1970). Varsovie 1963

    Google Scholar 

  66. Rasiowa, H.: An Algebraic Approach to Non-Classical Logics. North Holland, Amsterdam (1974)

    MATH  Google Scholar 

  67. Riordan, J.: Introduction to Combinatorial Analysis. Wiley, New York (1958)

    MATH  Google Scholar 

  68. Riordan, J.: Combinatorial Identities. Wiley, New York (1968)

    MATH  Google Scholar 

  69. Roux, M.: Algorithmes de Classification. Masson, Paris (1985)

    Google Scholar 

  70. Steven, W., Running, S.W., Loveland, Th.R., Pierce, L.L., Nemani, R.R., Hunt Jr., E.R.: A remote sensing based vegetation classification logic for global land cover analysis. Remote Sens. Environ. 51, 39–48 (1995)

    Article  Google Scholar 

  71. Ruphy, S.: Are stellar kinds natural kinds? A challenging newcomer in the monism/pluralism and realism/antirealism debates. Philos. Sci. 77, 1109–1120 (2010)

    Article  Google Scholar 

  72. Ruspini, E.H.: A new approach to clustering. Inf. Control 15, 33–37 (1969)

    Article  Google Scholar 

  73. Ruspini, E.H.: Numerical method for fuzzy clustering. Inf. Sci. 2, 319–350 (1970)

    Article  MATH  Google Scholar 

  74. Russell, B.: Human Knowledge, Its Scopes and Limits. Routledge, London (1992)

    Google Scholar 

  75. Salton, G.: Manipulation of trees in information retrieval. Commun. ACM 5, 103–114 (1962)

    Article  MATH  Google Scholar 

  76. Salton, G.: Automatic Information Organization and Retrieval. McGraw-Hill, New York (1968)

    Google Scholar 

  77. Skrasek, J.: Zaklady vyssi matematiky. Nase vosko, Praha (1966)

    Google Scholar 

  78. Smith, L.B.: The concept of same. Adv. Child Development Behav. 24, 216–253 (1993)

    Google Scholar 

  79. Soergel, D.: Mathematical analysis of documentation systems, an attempt to a theory of classification and search request formulation. Inf. Storage Retrieval 3(3), 129–173 (1967)

    Article  Google Scholar 

  80. Tarski, A., Jonsson, B.: Ordinal Algebras. North-Holland, Amsterdam (1956). (Appendix by Bjarni Jonsson)

    MATH  Google Scholar 

  81. Van Rijsbergen, C.J.: The Geometry of Information Retrieval. Cambridge University Press, Cambridge (2004)

    Book  MATH  Google Scholar 

  82. Vitter, J.S., Flajolet, P.: Average-case analysis of algorithms and data structures. In: van Leeuwen, J. (ed.) Handbook of Theoretical Computer Science, vol. A, pp. 431–524. Elsevier/MIT Press, Amsterdam/Cambridge (1990). Chap. 9

    Google Scholar 

  83. Wang, X., Syrmos, V.L.: Optimal cluster selection based on Fisher class separability measure. In: American Control Conference, 2005, vol. 3, pp. 1929–1934 (2005). Proceedings of the 2005 Volume

    Google Scholar 

  84. Woodger, J.H.: Problems arising from the application of mathematical logic to biology. In: Applications Scientifiques de la Logique mathématique, pp. 133–139. Gauthier-Villars, Paris (1954)

    Google Scholar 

  85. Yao, A.C.-C.: On random 3-2 trees. Technical Report UIUDCS-R-74679, Department of Computer Science, Urbana, University of Illinois, Oct. 1974

    Google Scholar 

  86. Yao, A.C.-C.: On random 2-3 trees. Acta Inform. 9(2), 159–170 (1977/78)

    Article  Google Scholar 

  87. Zadeh, L.A.: Similarity relations and fuzzy ordering. E.R.L. Report no M277. Elect. res. Lab. Univ. of Californian Berkeley, July 1970

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Basel

About this chapter

Cite this chapter

Parrochia, D., Neuville, P. (2013). Empirical Clustering and Classic Hierarchies. In: Towards a General Theory of Classifications. Studies in Universal Logic. Birkhäuser, Basel. https://doi.org/10.1007/978-3-0348-0609-1_3

Download citation

Publish with us

Policies and ethics