Skip to main content

Combinatoral Optimization in Clustering

  • Chapter
Handbook of Combinatorial Optimization

Abstract

Clustering is a mathematical technique designed for revealing classification structures in the data collected on real-world phenomena. A cluster is a piece of data (usually, a subset of the objects considered, or a subset of the variables, or both) consisting of the entities which are much “alike”, in terms of the data, versus the other part of the data. The term itself was coined in psychology back in thirties when a heuristical technique was suggested for clustering psychological variables based on pair-wise coefficients of correlation. However, two more disciplines also should be credited for the outburst of clustering occurred in the sixties: numerical taxonomy in biology and pattern recognition in machine learning. Among relevant sources are Hartigan (1975), Jain and Dubes (1988), Mirkin (1996). Simultaneously, industrial and computational applications gave rise to graph partitioning problems which are touched below in 6.2.4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Agarwala, V. Bafna, M. Farach, B. Narayanan, M. Paterson, and M. Thorup, On the approximability of numerical taxonomy, (DIMACS Technical Report 95–46, 1995).

    Google Scholar 

  2. A. Agrawal and P. Klein, Cutting down on fill using nested dissection: Provably good elimination orderings, in A. George, J.R. Gilbert, and J.W.H. Liu (eds.) Sparse Matrix Computation (London, Springer-Verlag, 1993).

    Google Scholar 

  3. P. Arabie, S.A. Boorman, and P.R. Levitt, Constructing block models: how and why Journal of Mathematical Psychology Vol. 17 (1978) pp. 21–63.

    Article  MATH  Google Scholar 

  4. P. Arabie and L. Hubert, Combinatorial data analysis Annu. Rev. Psychol. Vol. 43 (1992) pp. 169–203.

    Article  Google Scholar 

  5. P. Arabie, L. Hubert, G. De Soete (eds.) Classification and Clustering (River Edge, NJ: World Scientific Publishers, 1996).

    MATH  Google Scholar 

  6. C. Arcelli and G Sanniti di Baja, Skeletons of planar patterns, in T.Y. Kong and A. Rosenfeld (eds.) Topological Algorithms for Digital Image Processing (Amsterdam, Elsevier, 1996) pp. 99–143.

    Chapter  Google Scholar 

  7. H.-J. Bandelt and A.W.M. Dress, Weak hierarchies associated with similarity measures — an additive clustering technique Bulletin of Mathematical Biology Vol. 51 (1989) pp. 133–166.

    MATH  MathSciNet  Google Scholar 

  8. H.-J. Bandelt and A.W.M. Dress, A canonical decomposition theory for metrics on a finite set Advances of Mathematics Vol. 92 (1992) pp. 47–105.

    Article  MATH  MathSciNet  Google Scholar 

  9. J.-P. Benzécri (1973) L’Analyse des Données (Paris, Dunod, 1973).

    Google Scholar 

  10. P. Brucker (1978) On the complexity of clustering problems, in R.Henn et al. (eds.) Optimization and Operations Research (Berlin, Springer, 1978) pp. 45–54.

    Google Scholar 

  11. P. Buneman, The recovery of trees from measures of dissimilarity, in F. Hodson, D. Kendall, and P. Tautu (eds.) Mathematics in Archeological and Historical Sciences (Edinburg, Edinburg University Press, 1971) pp. 387–395.

    Google Scholar 

  12. P.B. Callahan and S.R. Kosaraju, A decomposition of multidimensional point sets with applications to k-nearest neighbors and n-body potential fields Journal of ACM Vol. 42 (1995) pp. 67–90.

    Article  MATH  MathSciNet  Google Scholar 

  13. A. Chaturvedi and J.D. Carroll, An alternating optimization approach to fitting INDCLUS and generalized INDCLUS models Journal of Classification Vol. 11 (1994) pp. 155–170.

    Article  MATH  Google Scholar 

  14. P. Crescenzi and V. Kann A compendium of NP optimization problems (URL site:http://www.nada.kth.se/viggo/problemlist/compendium2, 1995)

    Google Scholar 

  15. W.H.E. Day, Computational complexity of inferring phylogenies from dissimilarity matrices Bulletin of Mathematical Biology Vol. 49 (1987) pp. 461–467.

    MATH  MathSciNet  Google Scholar 

  16. W.H.E. Day (1996) Complexity theory: An introduction for practioners of classification, In: P. Arabie, L.J. Hubert, and G. De Soete (Eds.) Clustering and Classification World Scientific: River Edge, NJ, 199–233.

    Google Scholar 

  17. M. Delattre and P. Hansen, Bicriterion cluster analysis IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) Vol. 4 (1980) pp. 277–291.

    Article  Google Scholar 

  18. J. Demmel Applications of Parallel Computers (Lectures posted at web site:http://HTTP.CS.Berkeley.EDU/demmel/cs267/1996).

    Google Scholar 

  19. E. Diday, Orders and overlapping clusters by pyramids, in J. de Leeuw, W. Heiser, J. Meulman, and F. Critchley (eds.) Multidimensional Data Analysis (Leiden, DSWO Press, 1986) pp. 201–234.

    Google Scholar 

  20. A.A. Dorofeyuk, Methods for automatic classification: A Review Automation and Remote Control Vol. 32 No. 12 (1971) pp. 1928–1958.

    MathSciNet  Google Scholar 

  21. A.W.M. Dress and W. Terhalle, Well-layered maps - a class of greedily optimizable set functions Appl. Math. Lett. Vol. 8 No. 5 (1995) pp. 77–80.

    Article  MATH  MathSciNet  Google Scholar 

  22. H. Edelsbrunner Algorithms in Combinatorial Geometry (New York, Springer Verlag, 1987).

    MATH  Google Scholar 

  23. M. Fiedler, A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory Czech. Math. Journal Vol. 25 (1975) pp. 619–637.

    MathSciNet  Google Scholar 

  24. D.W. Fisher, Knowledge acquisition via incremental conceptual clustering Machine Learning Vol. 2 (1987) pp. 139–172.

    Google Scholar 

  25. K. Florek, J. Lukaszewicz, H. Perkal, H. Steinhaus, and S. Zubrzycki, Sur la liason et la division des points d’un ensemble fini Colloquium Mathematicum Vol. 2 (1951) pp. 282–285.

    Google Scholar 

  26. G. Gallo, M.D. Grigoriadis, and R.E. Tarjan, A fast parametric maximum flow algorithm and applications. SIAM Journal on Computing Vol. 18 (1989) pp. 30–55.

    Article  MATH  MathSciNet  Google Scholar 

  27. M.R. Garey and D.S. Johnson Computers and Intractability: a guide to the theory of NP-completeness (San Francisco, W.H.Freeman and Company, 1979).

    MATH  Google Scholar 

  28. M. Gondran and M. Minoux Graphs and Algorithms (New-York, J.Wiley & Sons, 1984).

    MATH  Google Scholar 

  29. J.C. Gower and G.J.S. Ross, Minimum spanning tree and single linkage cluster analysis Applied Statistics Vol. 18 pp. 54–64.

    Google Scholar 

  30. D. Gusfield, Efficient algorithms for inferring evolutionary trees Networks Vol. 21 (1991) pp. 19–28.

    Article  MATH  MathSciNet  Google Scholar 

  31. A. Guénoche, P. Hansen, and B. Jaumard, Efficient algorithms for divisive hierarchical clustering with the diameter criterion Journal of Classification Vol. 8 (1991) pp. 5–30.

    Article  MATH  MathSciNet  Google Scholar 

  32. L. Hagen, A.B. Kahng, New spectral methods for ratio cat partitioning and clustering IEEE Transactions on Computer-Aided Design Vol. 11 No. 9 (1992) pp. 1074–1085.

    Article  Google Scholar 

  33. P. Hansen, B. Jaumard, and N. Mladenovic, How to choose K entities among N. in I.J. Cox, P. Hansen, and B. Julesz (eds.) Partitioning Data Sets. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Providence, American Mathematical Society, 1995) pp. 105–116.

    Google Scholar 

  34. J.A. Hartigan, Direct clustering of a data matrix Journal of American Statistical Association Vol. 67 (1972) pp. 123–129.

    Article  Google Scholar 

  35. J.A. Hartigan Clustering Algorithms (New York, J.Wiley & Sons, 1975).

    MATH  Google Scholar 

  36. W.-L. Hsu and G.L. Nemhauser, Easy and hard bottleneck location problems Discrete Applied Mathematics Vol. 1 (1979) pp. 209–215.

    Article  MATH  MathSciNet  Google Scholar 

  37. L.J. Hubert Assignment Methods in Combinatorial Data Analysis (New York, M. Dekker, 1987).

    MATH  Google Scholar 

  38. L. Hubert and P. Arabie, The analysis of proximity matrices through sums of matrices having (anti)-Robinson forms British Journal of Mathematical and Statistical Psychology Vol. 47 (1994) pp. 1–40.

    Article  MATH  Google Scholar 

  39. A.K. Jain and R.C. Dubes Algorithms for Clustering Data (Englewood Cliffs, NJ, Prentice Hall, 1988).

    MATH  Google Scholar 

  40. K. Janich Linear Algebra (New York, Springer-Verlag, 1994).

    Book  Google Scholar 

  41. D.S. Johnson and M.A. Trick (eds.) Cliques, Coloring, and Satisfiability. DIMACS Series in Discrete mathematics and theoretical computer science, V.26. (Providence, RI, AMS, 1996) 657 p.

    MATH  Google Scholar 

  42. S.C. Johnson, Hierarchical clustering schemes Psychometrika Vol. 32 (1967) pp. 241–245.

    Article  Google Scholar 

  43. Y. Kempner, B. Mirkin, and I. Muchnik, Monotone linkage clustering and quasi-concave set functions. Applied Mathematics Letters Vol.10 No.4 (1997) pp. 19–24.

    Article  MATH  MathSciNet  Google Scholar 

  44. G. Keren and S. Baggen, Recognition models of alphanumeric characters Perception and Psychophysics (1981) pp. 234–246.

    Google Scholar 

  45. B. Kernighan and S. Lin, An effective heuristic procedure for partitioning of electrical circuits The Bell System Technical Journal Vol. 49 No. 2 (1970) pp. 291–307.

    MATH  Google Scholar 

  46. B. Krishnamurthy, An improved min-cut algorithm for partitioning VLSI networks IEEE Transactions on Computers Vol. 0–33 No. 5 (1984) pp. 438–446.

    Article  MATH  MathSciNet  Google Scholar 

  47. V. Kupershtoh, B. Mirkin, and V. Trofimov, Sum of within partition similarities as a clustering criterion Automation and Remote Control Vol. 37 No. 2 (1976) pp. 548–553.

    Google Scholar 

  48. V. Kupershtoh and V. Trofimov, An algorithm for analysis of the structure in a proximity matrix Automation and Remote Control Vol. 36 No. 11 (1975) pp. 1906–1916.

    MathSciNet  Google Scholar 

  49. G.N. Lance and W.T. Williams, A general theory of classificatory sorting strategies: 1. Hierarchical Systems Comp. Journal Vol. 9 (1967) pp. 373–380.

    Google Scholar 

  50. L. Lebart, A. Morineau, and M. Piron Statistique Exploratoire Multidimensionnelle (Paris, Dunod, 1995).

    MATH  Google Scholar 

  51. B. Leclerc, Minimum spanning trees for tree metrics: abridgments and adjustments Journal of Classification Vol. 12 (1995) pp. 207–242.

    Article  MATH  MathSciNet  Google Scholar 

  52. V. Levit, An algorithm for finding a maximum perimeter submatrix containing only unity, in a zero/one matrix, in V.S. Pereverzev-Orlov (ed.) Systems for Transmission and Processing of Data (Moscow, Institute of Information Transmission Science Press, 1988) pp. 42–45 (in Russian).

    Google Scholar 

  53. L. Libkin, I. Muchnik, and L. Shvarzer, Quasi-linear monotone systems Automation and Remote Control Vol. 50 pp. 1249–1259.

    Google Scholar 

  54. R.J. Lipton and R.E. Tarjan, A separator theorem for planar graphs SIAM Journal of Appl. Math. Vol. 36 (1979) pp. 177–189.

    Article  MATH  MathSciNet  Google Scholar 

  55. S. McGuinness, The greedy clique decomposition of a graph Journal of Graph Theory Vol. 18 (1994) pp. 427–430.

    Article  MATH  MathSciNet  Google Scholar 

  56. G.L. Miller, S.-H. Teng, W. Thurston, and S.A. Vavasis, Automatic mesh partitioning, in A. George, J.R. Gilbert, and J.W.H. Liu (eds.) Sparse Matrix Computations: Graph Theory Issues and Algorithms (London, Springer-Verlag, 1993).

    Google Scholar 

  57. G.W. Milligan, A Monte Carlo study of thirty internal criterion measures for cluster analysis Psychometrika Vol. 46 (1981) pp. 187–199.

    Article  MATH  MathSciNet  Google Scholar 

  58. B. Mirkin, Additive clustering and qualitative factor analysis methods for similarity matrices Journal of Classification Vol.4 (1987) pp. 7–31; Erratum Vol. 6 (1989) pp. 271–272.

    Article  MathSciNet  Google Scholar 

  59. B. Mirkin, A sequential fitting procedure for linear data analysis models Journal of Classification Vol. 7 (1990) pp. 167–195.

    Article  MATH  MathSciNet  Google Scholar 

  60. B. Mirkin, Approximation of association data by structures and clusters, in P.M. Pardalos and H. Wolkowicz (eds.) Quadratic Assignment and Related Problems. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, (Providence, American Mathematical Society, 1994) pp. 293–316.

    Google Scholar 

  61. B. Mirkin Mathematical Classification and Clustering (DordrechtBoston-London, Kluwer Academic Publishers, 1996).

    Book  MATH  Google Scholar 

  62. B. Mirkin, F. McMorris, F. Roberts, A. Rzhetsky (eds.) Mathematical Hierarchies and Biology. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, (Providence, RI, AMS, 1997) 389 p.

    MATH  Google Scholar 

  63. I. Muchnik and V. Kamensky, MONOSEL: a SAS macro for model selection in linear regression analysis, in Proceedings of the Eighteenth Annual SAS* Users Group International Conference (Cary, NC, SAS INstitute Inc., 1993) pp. 1103–1108.

    Google Scholar 

  64. I.B. Muchnik and L.V. Schwarzer, Nuclei of monotone systems on set semilattices Automation and Remote Control Vol. 52 (1989) 1993) pp. 1095–1102.

    Google Scholar 

  65. I.B. Muchnik and L.V. Schwarzer, Maximization of generalized characteristics of functions of monotone systems Automation and Remote Control Vol. 53 (1990) pp. 1562–1572.

    Google Scholar 

  66. J. Mullat, Extremal subsystems of monotone systems: I, II; Automation and Remote Control Vol.37 (1976) pp. 758–766, pp. 1286–1294.

    MATH  MathSciNet  Google Scholar 

  67. C.H. Papadimitriou and K. Steiglitz Combinatorial Optimization: Algorithms and Complexity (Englewood Cliffs, NJ, Prentice-Hall, 1982).

    MATH  Google Scholar 

  68. P.M. Pardalos, F. Rendl, and H. Wolkowicz, The quadratic assignment problem: a survey and recent developments. in P. Pardalos and H. Wolkowicz (eds.) Quadratic Assignment and Related Problems. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, v. 16. (Providence, American Mathematical Society, 1994).

    Google Scholar 

  69. Panos M. Pardalos and Henry Wolkowicz (Eds.) Topics in Semidefinite and Interior-Point Methods. Fields Institute Communications Series (Providence, American Mathematical Society, 1997).

    Google Scholar 

  70. A. Pothen, H.D. Simon, K.-P. Liou, Partitioning sparse matrices with eigenvectors of graphs SIAM Journal on Matrix Analysis and Applications Vol. 11 (1990) pp. 430–452.

    Article  MATH  MathSciNet  Google Scholar 

  71. S. Sattah and A. Tversky, Additive similarity trees Psychometrika Vol. 42 (1977) pp. 319–345.

    Article  Google Scholar 

  72. J. Setubal and J. Meidanis Introduction to Computational Molecular Biology (Boston, PWS Publishing Company, 1997).

    Google Scholar 

  73. R.N. Shepard and P. Arabie, Additive clustering: representation of similarities as combinations of overlapping properties Psychological Review Vol. 86 (1979) pp. 87–123.

    Article  Google Scholar 

  74. J.A. Studier and K.J. Keppler, A note on neighbor-joining algorithm of Saitou and Nei Molecular Biology and Evolution Vol. 5 (1988) pp. 729–731.

    Google Scholar 

  75. L. Vandenberghe and S. Boyd, Semidefinite programming SIAM Review Vol. 38 (1996) pp. 49–95.

    Article  MATH  MathSciNet  Google Scholar 

  76. B. Van Cutsem (Ed.) Classification and Dissimilarity Analysis Lecture Notes in Statistics, 93 (New York, Springer-Verlag, 1994).

    Google Scholar 

  77. J.H. Ward, Jr, Hierarchical grouping to optimize an objective function Journal of American Statist. Assoc. Vol. 58 (1963) pp. 236–244.

    Article  Google Scholar 

  78. D.J.A. Welsh Matroid Theory (London, Academic Press, 1976).

    MATH  Google Scholar 

  79. A.C. Yao, On constructing minimum spanning trees in k-dimensional space and related problems SIAM J. Comput. Vol. 11 (1982) pp. 721–736.

    Article  MATH  MathSciNet  Google Scholar 

  80. C.T. Zahn, Approximating symmetric relations by equivalence relations J. Soc. Indust. Appl. Math. Vol. 12, No. 4.

    Google Scholar 

  81. K.A. Zaretsky, Reconstruction of a tree from the distances between its pendant vertices Uspekhi Math. Nauk (Russian Mathematical Surveys) Vol. 20 pp. 90–92 (in Russian).

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Kluwer Academic Publishers

About this chapter

Cite this chapter

Mirkin, B., Muchnik, I. (1998). Combinatoral Optimization in Clustering. In: Du, DZ., Pardalos, P.M. (eds) Handbook of Combinatorial Optimization. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-0303-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-0303-9_15

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-7987-4

  • Online ISBN: 978-1-4613-0303-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics