Combinatoral Optimization in Clustering

Mirkin, Boris; Muchnik, Ilya

doi:10.1007/978-1-4613-0303-9_15

Boris Mirkin^3,4 &
Ilya Muchnik⁵

4270 Accesses
2 Citations

Abstract

Clustering is a mathematical technique designed for revealing classification structures in the data collected on real-world phenomena. A cluster is a piece of data (usually, a subset of the objects considered, or a subset of the variables, or both) consisting of the entities which are much “alike”, in terms of the data, versus the other part of the data. The term itself was coined in psychology back in thirties when a heuristical technique was suggested for clustering psychological variables based on pair-wise coefficients of correlation. However, two more disciplines also should be credited for the outburst of clustering occurred in the sixties: numerical taxonomy in biology and pattern recognition in machine learning. Among relevant sources are Hartigan (1975), Jain and Dubes (1988), Mirkin (1996). Simultaneously, industrial and computational applications gave rise to graph partitioning problems which are touched below in 6.2.4.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. Agarwala, V. Bafna, M. Farach, B. Narayanan, M. Paterson, and M. Thorup, On the approximability of numerical taxonomy, (DIMACS Technical Report 95–46, 1995).
Google Scholar
A. Agrawal and P. Klein, Cutting down on fill using nested dissection: Provably good elimination orderings, in A. George, J.R. Gilbert, and J.W.H. Liu (eds.) Sparse Matrix Computation (London, Springer-Verlag, 1993).
Google Scholar
P. Arabie, S.A. Boorman, and P.R. Levitt, Constructing block models: how and why Journal of Mathematical Psychology Vol. 17 (1978) pp. 21–63.
Article MATH Google Scholar
P. Arabie and L. Hubert, Combinatorial data analysis Annu. Rev. Psychol. Vol. 43 (1992) pp. 169–203.
Article Google Scholar
P. Arabie, L. Hubert, G. De Soete (eds.) Classification and Clustering (River Edge, NJ: World Scientific Publishers, 1996).
MATH Google Scholar
C. Arcelli and G Sanniti di Baja, Skeletons of planar patterns, in T.Y. Kong and A. Rosenfeld (eds.) Topological Algorithms for Digital Image Processing (Amsterdam, Elsevier, 1996) pp. 99–143.
Chapter Google Scholar
H.-J. Bandelt and A.W.M. Dress, Weak hierarchies associated with similarity measures — an additive clustering technique Bulletin of Mathematical Biology Vol. 51 (1989) pp. 133–166.
MATH MathSciNet Google Scholar
H.-J. Bandelt and A.W.M. Dress, A canonical decomposition theory for metrics on a finite set Advances of Mathematics Vol. 92 (1992) pp. 47–105.
Article MATH MathSciNet Google Scholar
J.-P. Benzécri (1973) L’Analyse des Données (Paris, Dunod, 1973).
Google Scholar
P. Brucker (1978) On the complexity of clustering problems, in R.Henn et al. (eds.) Optimization and Operations Research (Berlin, Springer, 1978) pp. 45–54.
Google Scholar
P. Buneman, The recovery of trees from measures of dissimilarity, in F. Hodson, D. Kendall, and P. Tautu (eds.) Mathematics in Archeological and Historical Sciences (Edinburg, Edinburg University Press, 1971) pp. 387–395.
Google Scholar
P.B. Callahan and S.R. Kosaraju, A decomposition of multidimensional point sets with applications to k-nearest neighbors and n-body potential fields Journal of ACM Vol. 42 (1995) pp. 67–90.
Article MATH MathSciNet Google Scholar
A. Chaturvedi and J.D. Carroll, An alternating optimization approach to fitting INDCLUS and generalized INDCLUS models Journal of Classification Vol. 11 (1994) pp. 155–170.
Article MATH Google Scholar
P. Crescenzi and V. Kann A compendium of NP optimization problems (URL site:http://www.nada.kth.se/viggo/problemlist/compendium2, 1995)
Google Scholar
W.H.E. Day, Computational complexity of inferring phylogenies from dissimilarity matrices Bulletin of Mathematical Biology Vol. 49 (1987) pp. 461–467.
MATH MathSciNet Google Scholar
W.H.E. Day (1996) Complexity theory: An introduction for practioners of classification, In: P. Arabie, L.J. Hubert, and G. De Soete (Eds.) Clustering and Classification World Scientific: River Edge, NJ, 199–233.
Google Scholar
M. Delattre and P. Hansen, Bicriterion cluster analysis IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) Vol. 4 (1980) pp. 277–291.
Article Google Scholar
J. Demmel Applications of Parallel Computers (Lectures posted at web site:http://HTTP.CS.Berkeley.EDU/demmel/cs267/1996).
Google Scholar
E. Diday, Orders and overlapping clusters by pyramids, in J. de Leeuw, W. Heiser, J. Meulman, and F. Critchley (eds.) Multidimensional Data Analysis (Leiden, DSWO Press, 1986) pp. 201–234.
Google Scholar
A.A. Dorofeyuk, Methods for automatic classification: A Review Automation and Remote Control Vol. 32 No. 12 (1971) pp. 1928–1958.
MathSciNet Google Scholar
A.W.M. Dress and W. Terhalle, Well-layered maps - a class of greedily optimizable set functions Appl. Math. Lett. Vol. 8 No. 5 (1995) pp. 77–80.
Article MATH MathSciNet Google Scholar
H. Edelsbrunner Algorithms in Combinatorial Geometry (New York, Springer Verlag, 1987).
MATH Google Scholar
M. Fiedler, A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory Czech. Math. Journal Vol. 25 (1975) pp. 619–637.
MathSciNet Google Scholar
D.W. Fisher, Knowledge acquisition via incremental conceptual clustering Machine Learning Vol. 2 (1987) pp. 139–172.
Google Scholar
K. Florek, J. Lukaszewicz, H. Perkal, H. Steinhaus, and S. Zubrzycki, Sur la liason et la division des points d’un ensemble fini Colloquium Mathematicum Vol. 2 (1951) pp. 282–285.
Google Scholar
G. Gallo, M.D. Grigoriadis, and R.E. Tarjan, A fast parametric maximum flow algorithm and applications. SIAM Journal on Computing Vol. 18 (1989) pp. 30–55.
Article MATH MathSciNet Google Scholar
M.R. Garey and D.S. Johnson Computers and Intractability: a guide to the theory of NP-completeness (San Francisco, W.H.Freeman and Company, 1979).
MATH Google Scholar
M. Gondran and M. Minoux Graphs and Algorithms (New-York, J.Wiley & Sons, 1984).
MATH Google Scholar
J.C. Gower and G.J.S. Ross, Minimum spanning tree and single linkage cluster analysis Applied Statistics Vol. 18 pp. 54–64.
Google Scholar
D. Gusfield, Efficient algorithms for inferring evolutionary trees Networks Vol. 21 (1991) pp. 19–28.
Article MATH MathSciNet Google Scholar
A. Guénoche, P. Hansen, and B. Jaumard, Efficient algorithms for divisive hierarchical clustering with the diameter criterion Journal of Classification Vol. 8 (1991) pp. 5–30.
Article MATH MathSciNet Google Scholar
L. Hagen, A.B. Kahng, New spectral methods for ratio cat partitioning and clustering IEEE Transactions on Computer-Aided Design Vol. 11 No. 9 (1992) pp. 1074–1085.
Article Google Scholar
P. Hansen, B. Jaumard, and N. Mladenovic, How to choose K entities among N. in I.J. Cox, P. Hansen, and B. Julesz (eds.) Partitioning Data Sets. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Providence, American Mathematical Society, 1995) pp. 105–116.
Google Scholar
J.A. Hartigan, Direct clustering of a data matrix Journal of American Statistical Association Vol. 67 (1972) pp. 123–129.
Article Google Scholar
J.A. Hartigan Clustering Algorithms (New York, J.Wiley & Sons, 1975).
MATH Google Scholar
W.-L. Hsu and G.L. Nemhauser, Easy and hard bottleneck location problems Discrete Applied Mathematics Vol. 1 (1979) pp. 209–215.
Article MATH MathSciNet Google Scholar
L.J. Hubert Assignment Methods in Combinatorial Data Analysis (New York, M. Dekker, 1987).
MATH Google Scholar
L. Hubert and P. Arabie, The analysis of proximity matrices through sums of matrices having (anti)-Robinson forms British Journal of Mathematical and Statistical Psychology Vol. 47 (1994) pp. 1–40.
Article MATH Google Scholar
A.K. Jain and R.C. Dubes Algorithms for Clustering Data (Englewood Cliffs, NJ, Prentice Hall, 1988).
MATH Google Scholar
K. Janich Linear Algebra (New York, Springer-Verlag, 1994).
Book Google Scholar
D.S. Johnson and M.A. Trick (eds.) Cliques, Coloring, and Satisfiability. DIMACS Series in Discrete mathematics and theoretical computer science, V.26. (Providence, RI, AMS, 1996) 657 p.
MATH Google Scholar
S.C. Johnson, Hierarchical clustering schemes Psychometrika Vol. 32 (1967) pp. 241–245.
Article Google Scholar
Y. Kempner, B. Mirkin, and I. Muchnik, Monotone linkage clustering and quasi-concave set functions. Applied Mathematics Letters Vol.10 No.4 (1997) pp. 19–24.
Article MATH MathSciNet Google Scholar
G. Keren and S. Baggen, Recognition models of alphanumeric characters Perception and Psychophysics (1981) pp. 234–246.
Google Scholar
B. Kernighan and S. Lin, An effective heuristic procedure for partitioning of electrical circuits The Bell System Technical Journal Vol. 49 No. 2 (1970) pp. 291–307.
MATH Google Scholar
B. Krishnamurthy, An improved min-cut algorithm for partitioning VLSI networks IEEE Transactions on Computers Vol. 0–33 No. 5 (1984) pp. 438–446.
Article MATH MathSciNet Google Scholar
V. Kupershtoh, B. Mirkin, and V. Trofimov, Sum of within partition similarities as a clustering criterion Automation and Remote Control Vol. 37 No. 2 (1976) pp. 548–553.
Google Scholar
V. Kupershtoh and V. Trofimov, An algorithm for analysis of the structure in a proximity matrix Automation and Remote Control Vol. 36 No. 11 (1975) pp. 1906–1916.
MathSciNet Google Scholar
G.N. Lance and W.T. Williams, A general theory of classificatory sorting strategies: 1. Hierarchical Systems Comp. Journal Vol. 9 (1967) pp. 373–380.
Google Scholar
L. Lebart, A. Morineau, and M. Piron Statistique Exploratoire Multidimensionnelle (Paris, Dunod, 1995).
MATH Google Scholar
B. Leclerc, Minimum spanning trees for tree metrics: abridgments and adjustments Journal of Classification Vol. 12 (1995) pp. 207–242.
Article MATH MathSciNet Google Scholar
V. Levit, An algorithm for finding a maximum perimeter submatrix containing only unity, in a zero/one matrix, in V.S. Pereverzev-Orlov (ed.) Systems for Transmission and Processing of Data (Moscow, Institute of Information Transmission Science Press, 1988) pp. 42–45 (in Russian).
Google Scholar
L. Libkin, I. Muchnik, and L. Shvarzer, Quasi-linear monotone systems Automation and Remote Control Vol. 50 pp. 1249–1259.
Google Scholar
R.J. Lipton and R.E. Tarjan, A separator theorem for planar graphs SIAM Journal of Appl. Math. Vol. 36 (1979) pp. 177–189.
Article MATH MathSciNet Google Scholar
S. McGuinness, The greedy clique decomposition of a graph Journal of Graph Theory Vol. 18 (1994) pp. 427–430.
Article MATH MathSciNet Google Scholar
G.L. Miller, S.-H. Teng, W. Thurston, and S.A. Vavasis, Automatic mesh partitioning, in A. George, J.R. Gilbert, and J.W.H. Liu (eds.) Sparse Matrix Computations: Graph Theory Issues and Algorithms (London, Springer-Verlag, 1993).
Google Scholar
G.W. Milligan, A Monte Carlo study of thirty internal criterion measures for cluster analysis Psychometrika Vol. 46 (1981) pp. 187–199.
Article MATH MathSciNet Google Scholar
B. Mirkin, Additive clustering and qualitative factor analysis methods for similarity matrices Journal of Classification Vol.4 (1987) pp. 7–31; Erratum Vol. 6 (1989) pp. 271–272.
Article MathSciNet Google Scholar
B. Mirkin, A sequential fitting procedure for linear data analysis models Journal of Classification Vol. 7 (1990) pp. 167–195.
Article MATH MathSciNet Google Scholar
B. Mirkin, Approximation of association data by structures and clusters, in P.M. Pardalos and H. Wolkowicz (eds.) Quadratic Assignment and Related Problems. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, (Providence, American Mathematical Society, 1994) pp. 293–316.
Google Scholar
B. Mirkin Mathematical Classification and Clustering (DordrechtBoston-London, Kluwer Academic Publishers, 1996).
Book MATH Google Scholar
B. Mirkin, F. McMorris, F. Roberts, A. Rzhetsky (eds.) Mathematical Hierarchies and Biology. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, (Providence, RI, AMS, 1997) 389 p.
MATH Google Scholar
I. Muchnik and V. Kamensky, MONOSEL: a SAS macro for model selection in linear regression analysis, in Proceedings of the Eighteenth Annual SAS* Users Group International Conference (Cary, NC, SAS INstitute Inc., 1993) pp. 1103–1108.
Google Scholar
I.B. Muchnik and L.V. Schwarzer, Nuclei of monotone systems on set semilattices Automation and Remote Control Vol. 52 (1989) 1993) pp. 1095–1102.
Google Scholar
I.B. Muchnik and L.V. Schwarzer, Maximization of generalized characteristics of functions of monotone systems Automation and Remote Control Vol. 53 (1990) pp. 1562–1572.
Google Scholar
J. Mullat, Extremal subsystems of monotone systems: I, II; Automation and Remote Control Vol.37 (1976) pp. 758–766, pp. 1286–1294.
MATH MathSciNet Google Scholar
C.H. Papadimitriou and K. Steiglitz Combinatorial Optimization: Algorithms and Complexity (Englewood Cliffs, NJ, Prentice-Hall, 1982).
MATH Google Scholar
P.M. Pardalos, F. Rendl, and H. Wolkowicz, The quadratic assignment problem: a survey and recent developments. in P. Pardalos and H. Wolkowicz (eds.) Quadratic Assignment and Related Problems. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, v. 16. (Providence, American Mathematical Society, 1994).
Google Scholar
Panos M. Pardalos and Henry Wolkowicz (Eds.) Topics in Semidefinite and Interior-Point Methods. Fields Institute Communications Series (Providence, American Mathematical Society, 1997).
Google Scholar
A. Pothen, H.D. Simon, K.-P. Liou, Partitioning sparse matrices with eigenvectors of graphs SIAM Journal on Matrix Analysis and Applications Vol. 11 (1990) pp. 430–452.
Article MATH MathSciNet Google Scholar
S. Sattah and A. Tversky, Additive similarity trees Psychometrika Vol. 42 (1977) pp. 319–345.
Article Google Scholar
J. Setubal and J. Meidanis Introduction to Computational Molecular Biology (Boston, PWS Publishing Company, 1997).
Google Scholar
R.N. Shepard and P. Arabie, Additive clustering: representation of similarities as combinations of overlapping properties Psychological Review Vol. 86 (1979) pp. 87–123.
Article Google Scholar
J.A. Studier and K.J. Keppler, A note on neighbor-joining algorithm of Saitou and Nei Molecular Biology and Evolution Vol. 5 (1988) pp. 729–731.
Google Scholar
L. Vandenberghe and S. Boyd, Semidefinite programming SIAM Review Vol. 38 (1996) pp. 49–95.
Article MATH MathSciNet Google Scholar
B. Van Cutsem (Ed.) Classification and Dissimilarity Analysis Lecture Notes in Statistics, 93 (New York, Springer-Verlag, 1994).
Google Scholar
J.H. Ward, Jr, Hierarchical grouping to optimize an objective function Journal of American Statist. Assoc. Vol. 58 (1963) pp. 236–244.
Article Google Scholar
D.J.A. Welsh Matroid Theory (London, Academic Press, 1976).
MATH Google Scholar
A.C. Yao, On constructing minimum spanning trees in k-dimensional space and related problems SIAM J. Comput. Vol. 11 (1982) pp. 721–736.
Article MATH MathSciNet Google Scholar
C.T. Zahn, Approximating symmetric relations by equivalence relations J. Soc. Indust. Appl. Math. Vol. 12, No. 4.
Google Scholar
K.A. Zaretsky, Reconstruction of a tree from the distances between its pendant vertices Uspekhi Math. Nauk (Russian Mathematical Surveys) Vol. 20 pp. 90–92 (in Russian).
Google Scholar

Download references

Author information

Authors and Affiliations

Center for Discrete Mathematics & Theoretical Computer Science (DIMACS), Rutgers University, P.O.Box 1179, Piscataway, NJ, 08855, USA
Boris Mirkin
Central Economics-Mathematics Institute (CEMI), Moscow, Russia
Boris Mirkin
RUTCOR and DIMACS, Rutgers University, P.O.Box 1179, Piscataway, NJ, 08855, USA
Ilya Muchnik

Authors

Boris Mirkin
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Muchnik
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Minnesota, Minneapolis, USA
Ding-Zhu Du
University of Florida, Gainesville, USA
Panos M. Pardalos

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Mirkin, B., Muchnik, I. (1998). Combinatoral Optimization in Clustering. In: Du, DZ., Pardalos, P.M. (eds) Handbook of Combinatorial Optimization. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-0303-9_15

Download citation

DOI: https://doi.org/10.1007/978-1-4613-0303-9_15
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7987-4
Online ISBN: 978-1-4613-0303-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics