Abstract
Clustering analysis(1–4) is a newly developed computer-oriented data analysis technique. It is a product of many research fields: statistics, computer science, operations research, and pattern recognition. Because of the diverse backgrounds of researchers, clustering analysis has many different names. In biology, clustering analysis is called “taxonomy”.(5,6) In pattern recognition(7–15) it is called “unsupervised learning.” Perhaps the most confusing name of all, the term “classification” sometimes also denotes clustering analysis. Since classification may denote discriminant analysis, which is totally different from clustering analysis, it is perhaps important to distinguish these two terms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
R. C. Tryon and D. E. Bailey, Cluster Analysis, McGraw-Hill, New York (1970).
M. R. Anderberg, Cluster Analysis for Applications, Academic Press, New York (1973).
J. A. Hartigan, Clustering Algorithms, Wiley, New York (1975).
J. V. Ryzin, Classification and Clustering, Academic Press, New York (1977).
R. R. Sokal and P. H. A. Sneath, Principles of Numerical Taxonomy, W. H. Freeman, San Francisco (1963).
N. Jardine and R. Sibson, Mathematical Taxonomy, Wiley, New York (1971).
R. Duda and P. Hart, Pattern Recognition and Scene Analysis, Wiley, New York (1973).
K. S. Fu, Syntactical Methods in Pattern Recognition, Academic Press, New York (1974).
W. S. Meisel, Computer-Oriented Approaches to Pattern Recognition, Academic Press, New York (1972).
J. T. Tou and R. C. Gonzalez, Pattern Recognition, Addison-Wesley, Reading, Massachusetts (1974).
U.K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York (1972).
C. H. Chen, Pattern Recognition and Artificial Intelligence, Academic Press, New York (1976).
Y. T. Chien, Interactive Pattern Recognition, Marcel Dekker, New York (1978).
E. A. Patrick, Fundamentals of Pattern Recognition, Prentice-Hall, Englewood Cliffs, New Jersey (1972).
K. S. Fu, Sequential Methods in Pattern Recognition, Academic Press, New York (1969).
M. O. Dayhoff, Computer analysis of protein evolution, Scientific American, 232(7), 69–85 (July, 1969).
A. C. Shaw, A formal picture description scheme as a basis for picture processing systems, Inform, and Control, 14, 9–53 (1969).
K. S. Fu and S. Y. Lu, A clustering procedure for syntactic patterns, IEEE Trans. on Systems Man Cybernet SMC--17 (10) 737–742 (1977).
K. S. Fu and S. Y. Lu, A sentence-to-sentence clustering procedure for pattern analysis, IEEE Trans. on Systems Man Cybernet. SMC-18 (5) 381–389 (May, 1978).
S. Y. Lu and K. S. Fu, Stochastic error-correcting syntactic analysis for reorganization of noisy patterns, IEEE Trans. Comput., C-26 (12) 1268–1276 (1977).
S. C. Chang and R. C. T. Lee, Clustering of syntactic Patterns without parsing, in Proceedings of International Computer Symposium, Taipei, Taiwan, pp. 626–643 (1978).
C. L. Chang and R. C. T. Lee, Symbolic Logic and Mechanical Theorem Proving, Academic Press, New York (1973).
C. C. Gotlieb and S. Kumar, Semantic clustering of index terms,J. Assoc. Comput. Mach. 15 (4) 493–513 (October, 1968).
D. Wishart, Mode analysis, a generalization of nearest neighbor which reduces chaining effects, in Numerical Taxonomy (A. J. Coles, ed.), Academic Press, New York, pp. 282–308 (1969).
E. C. Prim, Shortest connection network and some generalization, Bell System Tech. J., 36, 1389–1401 (November, 1957).
C.T. Zahn, Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Trans. Comput. C-20 (1) 68–86 (January, 1971).
R. A. Fisher, The use of multiple measurements in taxonomic problem, Annals of Eugenics, 7, 179–188 (1936).
G. Salton, Automatic Information Storage and Retrieval, McGraw-Hill, New York (1968).
A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, Massachusetts (1975).
R. E. Bonner, On some clustering techniques, IBM J. Res. Develop., 8 (1) 22–32 (1964).
J. G. Augustin and J. Minker, An analysis of some graph theoretical cluster techniques,J. Assoc. Comput. Mach. 17 (4), 571–588 (1970).
R. E. Osteen and J. T. Tou, A clique-detection algorithms based on neighborhoods in graphs, Internat. J. Comput. Inform. Sci., 2 (4), 257–268 (1973).
J. R. Slagle, C. L. Chang, and S. Heller, A clustering and data-reorganization algorithm, IEEE Trans. Systems Man Cybernet, SMC-15 (1), 121–128 (January, 1975).
J. R. Slagle, C. L. Chang, and R. C. T. Lee, Experiments with some clustering analysis algorithms, Pattern Recognition, 6, 181–187 (1974).
J. A. Hartigan, Direct clustering of data matrices, Journal Amer. Statist. Assoc, 67, 123–129 (March, 1970).
W. T. McCormick, P. J. Schweitzer, and T. W. White, Problem decomposition and data reorganization by a clustering technique, Oper. Res., 20 (5), 993–1009 (September-October, 1972).
J. K. Lenstra, Clustering a data array and the traveling salesman problem, Oper. Res., 22 (2), 413–414 (March-April, 1974).
S. B. Deutsch and J. J. Martin, An ordering algorithm for analysis of data arrays, Oper. Res., 19 (6), 1350–1362 (October, 1971).
M. G. Kendall, A Course in Multivariate Analysis, Hafner, New York (1968).
D. F. Morrison, Multivariate Statistical Methods, McGraw-Hill, New York (1967).
C. R. Rao, Linear Statistical Inference audits Applications, Wiley, New York (1973).
T. Y. Young and T. W. Calvert, Classification, Estimation and Pattern Recognition, American Elsevier, New York (1974).
W. M. Cooley and P. R. Lohnes, Multivariate Data Analysis, Wiley, New York (1974).
A. J. Chen and H. T. Wang, The display and analysis of Lansdat multi-spectral data over Taiwan, in Proceedings of International Computer Symposium, Taipei, Taiwan, pp. 1083–1101 (1978).
C. L. Chang and R. C. T. Lee, A heuristic relaxation method for nonlinear mapping in clustering analysis, IEEE Trans. Systems Man Cybernet., SMC-3 (3), 197–200 (March, 1978).
J. R. Slagle and R. C. T. Lee, Application of game tree searching to sequential pattern recognition, Comm. ACM, 14 (2), 103–110 (February, 1971).
R. C. T. Lee, J. R. Slagle, and H. Blum, A triangulation method for the sequential mapping of points from TV-space to 2-space, IEEE Trans. Comput., C-26 (3), 310–313 (March, 1977).
J. R. Slagle, Artificial Intelligence: A Heuristic Programming Approach, McGraw-Hill, New York (1971).
N. J. Nilsson, Problem Solving Methods in Artificial Intelligence, McGraw-Hill, New York (1971).
R. N. Shepard, The analysis of proximities: multidimensional scaling with an unknown distance function I, Psychometrika, 27, 125–140 (1962).
R. N. Shepard, The analysis of proximities: multidimensional scaling with an unknown distance function II, Psychometrika, 27, 219–246 (1962).
R. N. Shepard and J. D. Carroll, Parametric representation of nonlinear mapping data structures, in Proceedings of International Symposium on Multivariate Analysis, (P. R. Krishnaiah, ed.) Academic Press, New York (1966).
J. B. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Psychometrika, 29, 1–27 (March, 1964).
J. B. Kruskal, Nonmetric multidimensional scaling: A numerical method, Psychometrika, 29, 115–129 (June, 1964).
J. W. Sammon, Jr., A nonlinear mapping for data structure analysis, IEEE Trans. Comput., C-18 (5), 401–409 (May, 1969).
R. S. Bennett, The intrinsic dimensionality of signal collections, IEEE Trans. Inform. Theory, IT-15 (9), 517–525 (September, 1969).
K. Fukunaga and D. R. Olsen, An algorithm for finding intrinsic dimensionality of data, IEEE Trans. Comput., C-20 (2), 176–183 (February, 1971).
G. V. Trunk, Statistical estimation of the intrinsic dimensionality of data collections, Inform, and Control, 12 508–525 (1968).
C. K. Chen and H. C. Andrews, Nonlinear intrinsic dimensionality computations, IEEE Trans. Comput., C-23 (2), 178–184 (February, 1974).
D. H. Schwartzman and J. J. Vidal, An algorithm for determining the topological dimensionality of point clusters, IEEE Trans. Comput., C-24 (12), 1175–1182 (December, 1975).
P. E. Green and V. R. Rao, Applied Multidimensional Scaling, Holt, Reinhart and Winston, New York (1972).
P. E. Green and D. S. Tull, Research for Marketing Decision, Prentice-Hall, Englewood Cliffs, New Jersey (1975).
R. E. Frank and P. E. Green, Numerical taxonomy in marketing analysis: A review article, Journal of Marketing Research, 5, 83–98 (February, 1968).
R. C. T. Lee and C. L. Chang, Applications of minimal spanning trees to information storage, in Proceedings of International Symposium on Computers and Chinese Input/ Output Systems, pp. 1245–1256 (August, 1973).
W. M. Fitch, Toward defining the course of evolution in minimum change for a specific tree topology, Systematic Zoology, 20, 406–416 (1971).
A. N. C. Kang, R. C. T. Lee, C. L. Chang, and S. K. Chang, Storage reduction through minimal spanning trees and spanning forests, IEEE Trans. Comput., C-26 (5), 425–434 (May, 1977).
A. N. C. Kang and A. Ault, Some properties of a centroid of a free tree, Inform. Process. Lett., 3, 18–20 (Sept., 1975).
R. Tarjan, Depth first search and linear graph algorithms, SIAM J. Comput., 1 (2), 146–160 (1972).
R. C. T. Lee and S. H. Tseng, Multikey sorting, International Journal of Policy Analysis and Information Systems, 3 (2), 1–20 (1979).
E. Fix and J. L. Hodges, Jr., “Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties,” Report No. 7, USAF School of Aviation Medicine, Randolph Field, Texas (February, 1951).
S. A. Dudani, The distance weighted K-nearest-neighbor rule, IEEE Trans. Systems Man Cybernet., SMC-16, 325–327 (April, 1976).
C. W. Shen and R. C. T. Lee, “A Nearest Neighbor Search Technique With Short Zero-In Time,” Institute of Computer and Decision Sciences, National Tsing Hua University, Hsinchu, Taiwan, Republic of China.
W. A. Burkhard and R. M. Keller, Some approaches to best-match searching, Comm. ACM, 16 (4), 230–236 (April, 1973).
R. A. Rivest, “Analysis of Associative Retrieval Algorithms,” Ph.D. Dissertation, Department of Computer Science, Stanford University, Stanford, California (1974).
J. H. Friedman, F. Baskett, and L. J. Shustek, An algorithm for finding nearest neighbors, IEEE Trans. Comput. C-24 (10), 1000–1006 (October, 1975).
J. H. Friedman, J. L. Bentley, and R. A. Finkel, An algorithm for finding best matches in logarithm expected time, ACM Trans. Math. Software, 3 (3), 209–216 (September, 1977).
K. Fukunaga and P. M. Narendra, A branch and bound algorithm for computing Ar-nearest neighbors, IEEE Trans. Comput., C-24 (7), 750–753 (July, 1975).
C. T. Hsieh and R. C. T. Lee, Applications of symbolic error correcting code for nearest neighbor searching, in Proceedings of the National Computer Symposium of the Republic of China, Taipei, Taiwan, pp. 6: 7–6.14 (1976).
R. C. T. Lee, Y. H. Chin, and S. C. Chang, Application of principal component analysis to multikey searching, IEEE Trans. Software Engrg., SE-2 (3), (September, 1976).
H. C. Du and R. C. T. Lee, Symbolic gray code as a multikey hashing function, IEEE Trans. Pattern Analysis and Machine Intelligence, to appear.
J. L. Bentley, Multidimensional binary search trees used for associative searching, Comm. ACM, 18 (9), 509–517 (September, 1975).
D. E. Knuth, Sorting and Searchings, Vol. 3, The Act of Computer Programming, Addison-Wesley, Reading, Massachusetts (1973).
J. B. Rothnie and T. Lozano, Attribute based file organization in a paged memory environment, Comm. ACM, 17 (2), 63–69 (February, 1974).
J. McQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the 5th Berkeley Symposium on Probability and Statistics, pp. 280–297 (1967).
C. L. Chang, Finding prototypes for nearest neighbor classifiers, IEEE Trans. Comput., C-23 (11), (November, 1974).
R. C. T. Lee and T. T. Deng, An improved method for finding prototypes for nearest neighbor classifiers, in Proceedings of International Computer Symposium, Taipei, Taiwan, pp. 601–609 (1977).
J. L. Bentley and J. H. Friedman, Fast algorithms for constructing minimal spanning trees in coordinate space, IEEE Trans. Comput., C-27 (2), 97–105 (February, 1978).
C. C. Chen and R. C. T. Lee, “Some Algorithms Employing Nearest Neighbor Searching,” Institute of Computer and Decision Sciences, National Tsing Hua University, Hsinchu, Taiwan, Republic of China.
J. L. Bentley and M. I. Shamos, Divide and conquer for linear expected time, Inform. Process. Lett. 7 (2), pp. 87–91 (February, 1978).
J. L. Bentley and M. I. Shamos, Divide and conquer in multidimensional space, in Proceedings of the Eighth Annual ACM Symposium on Theory of Computing, pp. 220–230 (1976).
M. O. Rabin, Probabilistic algorithms, in Algorithms and Complexity (J. F. Traub, ed.) Academic Press, New York (1976).
G. Yuval, Finding nearest neighbors, Inform. Process. Lett., 5 (3), 63–65 (August, 1976).
C. W. Skinner, A heuristic approach to inductive inference fact retrieval systems, Comm. ACM, 17, 707–712 (December, 1974).
R. C. T. Lee, J. R. Slagle, and C. T. Mong, Towards automatic auditing of records, IEEE Trans. Software Eng., SE-4 (5), 441–448 (September, 1978).
J. Martin, Computer Data Base Organization, Prentice-Hall, Englewood Cliffs, New Jersey (1975).
J. Martin, Principles of Data Base Management, Prentice-Hall, Englewood Cliffs, New Jersey (1976).
H. Katzan, Computer Data Management and Data Base Technology, Van Nostrand Reinhold, New York (1975).
G. Wiederhold, Data Base Design, McGraw-Hill, New York (1977).
S. P. Ghosh, Data Base Organization for Data Management, Academic Press, New York (1977).
H. Lorin, Sorting and Sort Systems, Addison-Wesley, Reading, Massachusetts (1975).
S. P. Ghosh, File organization, The consecutive retrieval property, Comm. ACM, 15, 802–808 (1972).
S. P. Ghosh, The consecutive storage of relevant records with redundancy, Comm. ACM, 18, 464–471 (1975).
J. A. Hoffer and D. G. Severance, The use of cluster analysis in physical data base design, in Proceedings of International Conference on Very Large Data Bases, Far-mingham, Massachusetts pp. 69–86 (September, 1975).
R. L. Rivest, Partial-match retrieval algorithms, SIAM J. Comput., 5 (1), 19–50 (March, 1976).
J. H. Liou and S. B. Yao, Multidimensional clustering for data base organization, Inform. Systems, 2, 187–198 (1977).
W. C. Lin, R. C. T. Lee, and H. C. Du, Common properties of some multi-attribute file systems, IEEE Trans. Software Engrg., 5 (2), 160–174 (March, 1979).
C. C. Chang and R. C. T. Lee, “Optimal Cartesian Product Files For Partial Match Queries and Partial Match Patterns,” Institute of Computer and Decision Sciences, National Tsing Hua University, Hsinchu, Taiwan, Republic of China.
T. Yamane, Elementary Sampling Theory, Prentice-Hall, Englewood Cliffs, New Jersey (1967).
R. J. Freud and H. O. Hartley, A procedure for automatic data editing, J. Amer. Statist. Assoc, 62, 341–352, (June, 1967).
I. P. Felligi and D. Halt, A systematic approach to automatic editing and imputation, J. Amer. Statist. Assoc, 71, 17–35 (March, 1976).
J. I. Naus, T. G. Johnson, and R. Montalvo, A probabilistic method for identifying some errors and data editing, Journal Amer. Statist. Assoc, 67, 343–350 (December, 1972).
D. J. Hatfield and J. Gerald, Program restructuring for virtual memory, IBM Systems J., 10 (3), 168–192 (1971).
D. Ferrai, Improving locality by critical working sets, Comm. ACM, 17 (11), 614–620 (November, 1974).
J. L. Baer and G. R. Sager, Dynamic improvement of locality in virtual memory systems, IEEE Trans. Software Engrg., SE-2 (1), 54–61 (March, 1976).
C. C. Hsu and R. C. T. Lee, Applications of assignment technique to program restructuring, Journal of the Chinese Institute of Engineers, 2 (2), 151–160 (July, 1979).
D. Ferrai, Computer Systems Performance Evaluation, Prentice-Hall, Englewood Cliffs, New Jersey (1978).
S. Madnick and J. J. Donovan, Operating Systems, McGraw-Hill, New York (1974).
H. A. Taha, Operations Research, An Introduction, second edition, Macmillan, New York (1971).
D. T. Philips, Operations Research, Principles and Practice, Wiley, New York (1976),
W. G. Cochran, Sampling Techniques, third edition, Wiley, New York (1977).
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1981 Plenum Press, New York
About this chapter
Cite this chapter
Lee, R.C.T. (1981). Clustering Analysis and Its Applications. In: Tou, J.T. (eds) Advances in Information Systems Science. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-9883-7_4
Download citation
DOI: https://doi.org/10.1007/978-1-4613-9883-7_4
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-9885-1
Online ISBN: 978-1-4613-9883-7
eBook Packages: Springer Book Archive