Abstract
Clustering is a mathematical technique designed for revealing classification structures in the data collected on real-world phenomena. A cluster is a piece of data (usually, a subset of the objects considered, or a subset of the variables, or both) consisting of the entities which are much “alike”, in terms of the data, versus the other part of the data. The term itself was coined in psychology back in thirties when a heuristical technique was suggested for clustering psychological variables based on pair-wise coefficients of correlation. However, two more disciplines also should be credited for the outburst of clustering occurred in the sixties: numerical taxonomy in biology and pattern recognition in machine learning. Among relevant sources are Hartigan (1975), Jain and Dubes (1988), Mirkin (1996). Simultaneously, industrial and computational applications gave rise to graph partitioning problems which are touched below in 6.2.4.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. Agarwala, V. Bafna, M. Farach, B. Narayanan, M. Paterson, and M. Thorup, On the approximability of numerical taxonomy, (DIMACS Technical Report 95–46, 1995).
A. Agrawal and P. Klein, Cutting down on fill using nested dissection: Provably good elimination orderings, in A. George, J.R. Gilbert, and J.W.H. Liu (eds.) Sparse Matrix Computation (London, Springer-Verlag, 1993).
P. Arabie, S.A. Boorman, and P.R. Levitt, Constructing block models: how and why Journal of Mathematical Psychology Vol. 17 (1978) pp. 21–63.
P. Arabie and L. Hubert, Combinatorial data analysis Annu. Rev. Psychol. Vol. 43 (1992) pp. 169–203.
P. Arabie, L. Hubert, G. De Soete (eds.) Classification and Clustering (River Edge, NJ: World Scientific Publishers, 1996).
C. Arcelli and G Sanniti di Baja, Skeletons of planar patterns, in T.Y. Kong and A. Rosenfeld (eds.) Topological Algorithms for Digital Image Processing (Amsterdam, Elsevier, 1996) pp. 99–143.
H.-J. Bandelt and A.W.M. Dress, Weak hierarchies associated with similarity measures — an additive clustering technique Bulletin of Mathematical Biology Vol. 51 (1989) pp. 133–166.
H.-J. Bandelt and A.W.M. Dress, A canonical decomposition theory for metrics on a finite set Advances of Mathematics Vol. 92 (1992) pp. 47–105.
J.-P. Benzécri (1973) L’Analyse des Données (Paris, Dunod, 1973).
P. Brucker (1978) On the complexity of clustering problems, in R.Henn et al. (eds.) Optimization and Operations Research (Berlin, Springer, 1978) pp. 45–54.
P. Buneman, The recovery of trees from measures of dissimilarity, in F. Hodson, D. Kendall, and P. Tautu (eds.) Mathematics in Archeological and Historical Sciences (Edinburg, Edinburg University Press, 1971) pp. 387–395.
P.B. Callahan and S.R. Kosaraju, A decomposition of multidimensional point sets with applications to k-nearest neighbors and n-body potential fields Journal of ACM Vol. 42 (1995) pp. 67–90.
A. Chaturvedi and J.D. Carroll, An alternating optimization approach to fitting INDCLUS and generalized INDCLUS models Journal of Classification Vol. 11 (1994) pp. 155–170.
P. Crescenzi and V. Kann A compendium of NP optimization problems (URL site:http://www.nada.kth.se/viggo/problemlist/compendium2, 1995)
W.H.E. Day, Computational complexity of inferring phylogenies from dissimilarity matrices Bulletin of Mathematical Biology Vol. 49 (1987) pp. 461–467.
W.H.E. Day (1996) Complexity theory: An introduction for practioners of classification, In: P. Arabie, L.J. Hubert, and G. De Soete (Eds.) Clustering and Classification World Scientific: River Edge, NJ, 199–233.
M. Delattre and P. Hansen, Bicriterion cluster analysis IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) Vol. 4 (1980) pp. 277–291.
J. Demmel Applications of Parallel Computers (Lectures posted at web site:http://HTTP.CS.Berkeley.EDU/demmel/cs267/1996).
E. Diday, Orders and overlapping clusters by pyramids, in J. de Leeuw, W. Heiser, J. Meulman, and F. Critchley (eds.) Multidimensional Data Analysis (Leiden, DSWO Press, 1986) pp. 201–234.
A.A. Dorofeyuk, Methods for automatic classification: A Review Automation and Remote Control Vol. 32 No. 12 (1971) pp. 1928–1958.
A.W.M. Dress and W. Terhalle, Well-layered maps - a class of greedily optimizable set functions Appl. Math. Lett. Vol. 8 No. 5 (1995) pp. 77–80.
H. Edelsbrunner Algorithms in Combinatorial Geometry (New York, Springer Verlag, 1987).
M. Fiedler, A property of eigenvectors of nonnegative symmetric matrices and its application to graph theory Czech. Math. Journal Vol. 25 (1975) pp. 619–637.
D.W. Fisher, Knowledge acquisition via incremental conceptual clustering Machine Learning Vol. 2 (1987) pp. 139–172.
K. Florek, J. Lukaszewicz, H. Perkal, H. Steinhaus, and S. Zubrzycki, Sur la liason et la division des points d’un ensemble fini Colloquium Mathematicum Vol. 2 (1951) pp. 282–285.
G. Gallo, M.D. Grigoriadis, and R.E. Tarjan, A fast parametric maximum flow algorithm and applications. SIAM Journal on Computing Vol. 18 (1989) pp. 30–55.
M.R. Garey and D.S. Johnson Computers and Intractability: a guide to the theory of NP-completeness (San Francisco, W.H.Freeman and Company, 1979).
M. Gondran and M. Minoux Graphs and Algorithms (New-York, J.Wiley & Sons, 1984).
J.C. Gower and G.J.S. Ross, Minimum spanning tree and single linkage cluster analysis Applied Statistics Vol. 18 pp. 54–64.
D. Gusfield, Efficient algorithms for inferring evolutionary trees Networks Vol. 21 (1991) pp. 19–28.
A. Guénoche, P. Hansen, and B. Jaumard, Efficient algorithms for divisive hierarchical clustering with the diameter criterion Journal of Classification Vol. 8 (1991) pp. 5–30.
L. Hagen, A.B. Kahng, New spectral methods for ratio cat partitioning and clustering IEEE Transactions on Computer-Aided Design Vol. 11 No. 9 (1992) pp. 1074–1085.
P. Hansen, B. Jaumard, and N. Mladenovic, How to choose K entities among N. in I.J. Cox, P. Hansen, and B. Julesz (eds.) Partitioning Data Sets. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, Providence, American Mathematical Society, 1995) pp. 105–116.
J.A. Hartigan, Direct clustering of a data matrix Journal of American Statistical Association Vol. 67 (1972) pp. 123–129.
J.A. Hartigan Clustering Algorithms (New York, J.Wiley & Sons, 1975).
W.-L. Hsu and G.L. Nemhauser, Easy and hard bottleneck location problems Discrete Applied Mathematics Vol. 1 (1979) pp. 209–215.
L.J. Hubert Assignment Methods in Combinatorial Data Analysis (New York, M. Dekker, 1987).
L. Hubert and P. Arabie, The analysis of proximity matrices through sums of matrices having (anti)-Robinson forms British Journal of Mathematical and Statistical Psychology Vol. 47 (1994) pp. 1–40.
A.K. Jain and R.C. Dubes Algorithms for Clustering Data (Englewood Cliffs, NJ, Prentice Hall, 1988).
K. Janich Linear Algebra (New York, Springer-Verlag, 1994).
D.S. Johnson and M.A. Trick (eds.) Cliques, Coloring, and Satisfiability. DIMACS Series in Discrete mathematics and theoretical computer science, V.26. (Providence, RI, AMS, 1996) 657 p.
S.C. Johnson, Hierarchical clustering schemes Psychometrika Vol. 32 (1967) pp. 241–245.
Y. Kempner, B. Mirkin, and I. Muchnik, Monotone linkage clustering and quasi-concave set functions. Applied Mathematics Letters Vol.10 No.4 (1997) pp. 19–24.
G. Keren and S. Baggen, Recognition models of alphanumeric characters Perception and Psychophysics (1981) pp. 234–246.
B. Kernighan and S. Lin, An effective heuristic procedure for partitioning of electrical circuits The Bell System Technical Journal Vol. 49 No. 2 (1970) pp. 291–307.
B. Krishnamurthy, An improved min-cut algorithm for partitioning VLSI networks IEEE Transactions on Computers Vol. 0–33 No. 5 (1984) pp. 438–446.
V. Kupershtoh, B. Mirkin, and V. Trofimov, Sum of within partition similarities as a clustering criterion Automation and Remote Control Vol. 37 No. 2 (1976) pp. 548–553.
V. Kupershtoh and V. Trofimov, An algorithm for analysis of the structure in a proximity matrix Automation and Remote Control Vol. 36 No. 11 (1975) pp. 1906–1916.
G.N. Lance and W.T. Williams, A general theory of classificatory sorting strategies: 1. Hierarchical Systems Comp. Journal Vol. 9 (1967) pp. 373–380.
L. Lebart, A. Morineau, and M. Piron Statistique Exploratoire Multidimensionnelle (Paris, Dunod, 1995).
B. Leclerc, Minimum spanning trees for tree metrics: abridgments and adjustments Journal of Classification Vol. 12 (1995) pp. 207–242.
V. Levit, An algorithm for finding a maximum perimeter submatrix containing only unity, in a zero/one matrix, in V.S. Pereverzev-Orlov (ed.) Systems for Transmission and Processing of Data (Moscow, Institute of Information Transmission Science Press, 1988) pp. 42–45 (in Russian).
L. Libkin, I. Muchnik, and L. Shvarzer, Quasi-linear monotone systems Automation and Remote Control Vol. 50 pp. 1249–1259.
R.J. Lipton and R.E. Tarjan, A separator theorem for planar graphs SIAM Journal of Appl. Math. Vol. 36 (1979) pp. 177–189.
S. McGuinness, The greedy clique decomposition of a graph Journal of Graph Theory Vol. 18 (1994) pp. 427–430.
G.L. Miller, S.-H. Teng, W. Thurston, and S.A. Vavasis, Automatic mesh partitioning, in A. George, J.R. Gilbert, and J.W.H. Liu (eds.) Sparse Matrix Computations: Graph Theory Issues and Algorithms (London, Springer-Verlag, 1993).
G.W. Milligan, A Monte Carlo study of thirty internal criterion measures for cluster analysis Psychometrika Vol. 46 (1981) pp. 187–199.
B. Mirkin, Additive clustering and qualitative factor analysis methods for similarity matrices Journal of Classification Vol.4 (1987) pp. 7–31; Erratum Vol. 6 (1989) pp. 271–272.
B. Mirkin, A sequential fitting procedure for linear data analysis models Journal of Classification Vol. 7 (1990) pp. 167–195.
B. Mirkin, Approximation of association data by structures and clusters, in P.M. Pardalos and H. Wolkowicz (eds.) Quadratic Assignment and Related Problems. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, (Providence, American Mathematical Society, 1994) pp. 293–316.
B. Mirkin Mathematical Classification and Clustering (DordrechtBoston-London, Kluwer Academic Publishers, 1996).
B. Mirkin, F. McMorris, F. Roberts, A. Rzhetsky (eds.) Mathematical Hierarchies and Biology. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, (Providence, RI, AMS, 1997) 389 p.
I. Muchnik and V. Kamensky, MONOSEL: a SAS macro for model selection in linear regression analysis, in Proceedings of the Eighteenth Annual SAS* Users Group International Conference (Cary, NC, SAS INstitute Inc., 1993) pp. 1103–1108.
I.B. Muchnik and L.V. Schwarzer, Nuclei of monotone systems on set semilattices Automation and Remote Control Vol. 52 (1989) 1993) pp. 1095–1102.
I.B. Muchnik and L.V. Schwarzer, Maximization of generalized characteristics of functions of monotone systems Automation and Remote Control Vol. 53 (1990) pp. 1562–1572.
J. Mullat, Extremal subsystems of monotone systems: I, II; Automation and Remote Control Vol.37 (1976) pp. 758–766, pp. 1286–1294.
C.H. Papadimitriou and K. Steiglitz Combinatorial Optimization: Algorithms and Complexity (Englewood Cliffs, NJ, Prentice-Hall, 1982).
P.M. Pardalos, F. Rendl, and H. Wolkowicz, The quadratic assignment problem: a survey and recent developments. in P. Pardalos and H. Wolkowicz (eds.) Quadratic Assignment and Related Problems. DIMACS Series in Discrete Mathematics and Theoretical Computer Science, v. 16. (Providence, American Mathematical Society, 1994).
Panos M. Pardalos and Henry Wolkowicz (Eds.) Topics in Semidefinite and Interior-Point Methods. Fields Institute Communications Series (Providence, American Mathematical Society, 1997).
A. Pothen, H.D. Simon, K.-P. Liou, Partitioning sparse matrices with eigenvectors of graphs SIAM Journal on Matrix Analysis and Applications Vol. 11 (1990) pp. 430–452.
S. Sattah and A. Tversky, Additive similarity trees Psychometrika Vol. 42 (1977) pp. 319–345.
J. Setubal and J. Meidanis Introduction to Computational Molecular Biology (Boston, PWS Publishing Company, 1997).
R.N. Shepard and P. Arabie, Additive clustering: representation of similarities as combinations of overlapping properties Psychological Review Vol. 86 (1979) pp. 87–123.
J.A. Studier and K.J. Keppler, A note on neighbor-joining algorithm of Saitou and Nei Molecular Biology and Evolution Vol. 5 (1988) pp. 729–731.
L. Vandenberghe and S. Boyd, Semidefinite programming SIAM Review Vol. 38 (1996) pp. 49–95.
B. Van Cutsem (Ed.) Classification and Dissimilarity Analysis Lecture Notes in Statistics, 93 (New York, Springer-Verlag, 1994).
J.H. Ward, Jr, Hierarchical grouping to optimize an objective function Journal of American Statist. Assoc. Vol. 58 (1963) pp. 236–244.
D.J.A. Welsh Matroid Theory (London, Academic Press, 1976).
A.C. Yao, On constructing minimum spanning trees in k-dimensional space and related problems SIAM J. Comput. Vol. 11 (1982) pp. 721–736.
C.T. Zahn, Approximating symmetric relations by equivalence relations J. Soc. Indust. Appl. Math. Vol. 12, No. 4.
K.A. Zaretsky, Reconstruction of a tree from the distances between its pendant vertices Uspekhi Math. Nauk (Russian Mathematical Surveys) Vol. 20 pp. 90–92 (in Russian).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Kluwer Academic Publishers
About this chapter
Cite this chapter
Mirkin, B., Muchnik, I. (1998). Combinatoral Optimization in Clustering. In: Du, DZ., Pardalos, P.M. (eds) Handbook of Combinatorial Optimization. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-0303-9_15
Download citation
DOI: https://doi.org/10.1007/978-1-4613-0303-9_15
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-7987-4
Online ISBN: 978-1-4613-0303-9
eBook Packages: Springer Book Archive