Clustering Analysis and Its Applications

Lee, R. C. T.

doi:10.1007/978-1-4613-9883-7_4

R. C. T. Lee²

179 Accesses
23 Citations

Abstract

Clustering analysis^(1–4) is a newly developed computer-oriented data analysis technique. It is a product of many research fields: statistics, computer science, operations research, and pattern recognition. Because of the diverse backgrounds of researchers, clustering analysis has many different names. In biology, clustering analysis is called “taxonomy”.^(5,6) In pattern recognition^(7–15) it is called “unsupervised learning.” Perhaps the most confusing name of all, the term “classification” sometimes also denotes clustering analysis. Since classification may denote discriminant analysis, which is totally different from clustering analysis, it is perhaps important to distinguish these two terms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

R. C. Tryon and D. E. Bailey, Cluster Analysis, McGraw-Hill, New York (1970).
Google Scholar
M. R. Anderberg, Cluster Analysis for Applications, Academic Press, New York (1973).
MATH Google Scholar
J. A. Hartigan, Clustering Algorithms, Wiley, New York (1975).
MATH Google Scholar
J. V. Ryzin, Classification and Clustering, Academic Press, New York (1977).
Google Scholar
R. R. Sokal and P. H. A. Sneath, Principles of Numerical Taxonomy, W. H. Freeman, San Francisco (1963).
Google Scholar
N. Jardine and R. Sibson, Mathematical Taxonomy, Wiley, New York (1971).
MATH Google Scholar
R. Duda and P. Hart, Pattern Recognition and Scene Analysis, Wiley, New York (1973).
Google Scholar
K. S. Fu, Syntactical Methods in Pattern Recognition, Academic Press, New York (1974).
Google Scholar
W. S. Meisel, Computer-Oriented Approaches to Pattern Recognition, Academic Press, New York (1972).
MATH Google Scholar
J. T. Tou and R. C. Gonzalez, Pattern Recognition, Addison-Wesley, Reading, Massachusetts (1974).
MATH Google Scholar
U.K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York (1972).
Google Scholar
C. H. Chen, Pattern Recognition and Artificial Intelligence, Academic Press, New York (1976).
Google Scholar
Y. T. Chien, Interactive Pattern Recognition, Marcel Dekker, New York (1978).
Google Scholar
E. A. Patrick, Fundamentals of Pattern Recognition, Prentice-Hall, Englewood Cliffs, New Jersey (1972).
MATH Google Scholar
K. S. Fu, Sequential Methods in Pattern Recognition, Academic Press, New York (1969).
Google Scholar
M. O. Dayhoff, Computer analysis of protein evolution, Scientific American, 232(7), 69–85 (July, 1969).
Google Scholar
A. C. Shaw, A formal picture description scheme as a basis for picture processing systems, Inform, and Control, 14, 9–53 (1969).
MATH Google Scholar
K. S. Fu and S. Y. Lu, A clustering procedure for syntactic patterns, IEEE Trans. on Systems Man Cybernet SMC--17 (10) 737–742 (1977).
MathSciNet Google Scholar
K. S. Fu and S. Y. Lu, A sentence-to-sentence clustering procedure for pattern analysis, IEEE Trans. on Systems Man Cybernet. SMC-18 (5) 381–389 (May, 1978).
MathSciNet Google Scholar
S. Y. Lu and K. S. Fu, Stochastic error-correcting syntactic analysis for reorganization of noisy patterns, IEEE Trans. Comput., C-26 (12) 1268–1276 (1977).
MathSciNet Google Scholar
S. C. Chang and R. C. T. Lee, Clustering of syntactic Patterns without parsing, in Proceedings of International Computer Symposium, Taipei, Taiwan, pp. 626–643 (1978).
Google Scholar
C. L. Chang and R. C. T. Lee, Symbolic Logic and Mechanical Theorem Proving, Academic Press, New York (1973).
MATH Google Scholar
C. C. Gotlieb and S. Kumar, Semantic clustering of index terms,J. Assoc. Comput. Mach. 15 (4) 493–513 (October, 1968).
Google Scholar
D. Wishart, Mode analysis, a generalization of nearest neighbor which reduces chaining effects, in Numerical Taxonomy (A. J. Coles, ed.), Academic Press, New York, pp. 282–308 (1969).
Google Scholar
E. C. Prim, Shortest connection network and some generalization, Bell System Tech. J., 36, 1389–1401 (November, 1957).
Google Scholar
C.T. Zahn, Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Trans. Comput. C-20 (1) 68–86 (January, 1971).
Google Scholar
R. A. Fisher, The use of multiple measurements in taxonomic problem, Annals of Eugenics, 7, 179–188 (1936).
Google Scholar
G. Salton, Automatic Information Storage and Retrieval, McGraw-Hill, New York (1968).
Google Scholar
A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, Massachusetts (1975).
Google Scholar
R. E. Bonner, On some clustering techniques, IBM J. Res. Develop., 8 (1) 22–32 (1964).
MATH Google Scholar
J. G. Augustin and J. Minker, An analysis of some graph theoretical cluster techniques,J. Assoc. Comput. Mach. 17 (4), 571–588 (1970).
Google Scholar
R. E. Osteen and J. T. Tou, A clique-detection algorithms based on neighborhoods in graphs, Internat. J. Comput. Inform. Sci., 2 (4), 257–268 (1973).
MathSciNet MATH Google Scholar
J. R. Slagle, C. L. Chang, and S. Heller, A clustering and data-reorganization algorithm, IEEE Trans. Systems Man Cybernet, SMC-15 (1), 121–128 (January, 1975).
Google Scholar
J. R. Slagle, C. L. Chang, and R. C. T. Lee, Experiments with some clustering analysis algorithms, Pattern Recognition, 6, 181–187 (1974).
Google Scholar
J. A. Hartigan, Direct clustering of data matrices, Journal Amer. Statist. Assoc, 67, 123–129 (March, 1970).
Google Scholar
W. T. McCormick, P. J. Schweitzer, and T. W. White, Problem decomposition and data reorganization by a clustering technique, Oper. Res., 20 (5), 993–1009 (September-October, 1972).
MATH Google Scholar
J. K. Lenstra, Clustering a data array and the traveling salesman problem, Oper. Res., 22 (2), 413–414 (March-April, 1974).
MathSciNet MATH Google Scholar
S. B. Deutsch and J. J. Martin, An ordering algorithm for analysis of data arrays, Oper. Res., 19 (6), 1350–1362 (October, 1971).
MATH Google Scholar
M. G. Kendall, A Course in Multivariate Analysis, Hafner, New York (1968).
Google Scholar
D. F. Morrison, Multivariate Statistical Methods, McGraw-Hill, New York (1967).
MATH Google Scholar
C. R. Rao, Linear Statistical Inference audits Applications, Wiley, New York (1973).
Google Scholar
T. Y. Young and T. W. Calvert, Classification, Estimation and Pattern Recognition, American Elsevier, New York (1974).
MATH Google Scholar
W. M. Cooley and P. R. Lohnes, Multivariate Data Analysis, Wiley, New York (1974).
Google Scholar
A. J. Chen and H. T. Wang, The display and analysis of Lansdat multi-spectral data over Taiwan, in Proceedings of International Computer Symposium, Taipei, Taiwan, pp. 1083–1101 (1978).
Google Scholar
C. L. Chang and R. C. T. Lee, A heuristic relaxation method for nonlinear mapping in clustering analysis, IEEE Trans. Systems Man Cybernet., SMC-3 (3), 197–200 (March, 1978).
Google Scholar
J. R. Slagle and R. C. T. Lee, Application of game tree searching to sequential pattern recognition, Comm. ACM, 14 (2), 103–110 (February, 1971).
MATH Google Scholar
R. C. T. Lee, J. R. Slagle, and H. Blum, A triangulation method for the sequential mapping of points from TV-space to 2-space, IEEE Trans. Comput., C-26 (3), 310–313 (March, 1977).
Google Scholar
J. R. Slagle, Artificial Intelligence: A Heuristic Programming Approach, McGraw-Hill, New York (1971).
Google Scholar
N. J. Nilsson, Problem Solving Methods in Artificial Intelligence, McGraw-Hill, New York (1971).
Google Scholar
R. N. Shepard, The analysis of proximities: multidimensional scaling with an unknown distance function I, Psychometrika, 27, 125–140 (1962).
MathSciNet MATH Google Scholar
R. N. Shepard, The analysis of proximities: multidimensional scaling with an unknown distance function II, Psychometrika, 27, 219–246 (1962).
MathSciNet Google Scholar
R. N. Shepard and J. D. Carroll, Parametric representation of nonlinear mapping data structures, in Proceedings of International Symposium on Multivariate Analysis, (P. R. Krishnaiah, ed.) Academic Press, New York (1966).
Google Scholar
J. B. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Psychometrika, 29, 1–27 (March, 1964).
MathSciNet MATH Google Scholar
J. B. Kruskal, Nonmetric multidimensional scaling: A numerical method, Psychometrika, 29, 115–129 (June, 1964).
MathSciNet MATH Google Scholar
J. W. Sammon, Jr., A nonlinear mapping for data structure analysis, IEEE Trans. Comput., C-18 (5), 401–409 (May, 1969).
Google Scholar
R. S. Bennett, The intrinsic dimensionality of signal collections, IEEE Trans. Inform. Theory, IT-15 (9), 517–525 (September, 1969).
Google Scholar
K. Fukunaga and D. R. Olsen, An algorithm for finding intrinsic dimensionality of data, IEEE Trans. Comput., C-20 (2), 176–183 (February, 1971).
Google Scholar
G. V. Trunk, Statistical estimation of the intrinsic dimensionality of data collections, Inform, and Control, 12 508–525 (1968).
Google Scholar
C. K. Chen and H. C. Andrews, Nonlinear intrinsic dimensionality computations, IEEE Trans. Comput., C-23 (2), 178–184 (February, 1974).
Google Scholar
D. H. Schwartzman and J. J. Vidal, An algorithm for determining the topological dimensionality of point clusters, IEEE Trans. Comput., C-24 (12), 1175–1182 (December, 1975).
Google Scholar
P. E. Green and V. R. Rao, Applied Multidimensional Scaling, Holt, Reinhart and Winston, New York (1972).
Google Scholar
P. E. Green and D. S. Tull, Research for Marketing Decision, Prentice-Hall, Englewood Cliffs, New Jersey (1975).
Google Scholar
R. E. Frank and P. E. Green, Numerical taxonomy in marketing analysis: A review article, Journal of Marketing Research, 5, 83–98 (February, 1968).
Google Scholar
R. C. T. Lee and C. L. Chang, Applications of minimal spanning trees to information storage, in Proceedings of International Symposium on Computers and Chinese Input/ Output Systems, pp. 1245–1256 (August, 1973).
Google Scholar
W. M. Fitch, Toward defining the course of evolution in minimum change for a specific tree topology, Systematic Zoology, 20, 406–416 (1971).
Google Scholar
A. N. C. Kang, R. C. T. Lee, C. L. Chang, and S. K. Chang, Storage reduction through minimal spanning trees and spanning forests, IEEE Trans. Comput., C-26 (5), 425–434 (May, 1977).
MathSciNet Google Scholar
A. N. C. Kang and A. Ault, Some properties of a centroid of a free tree, Inform. Process. Lett., 3, 18–20 (Sept., 1975).
MathSciNet Google Scholar
R. Tarjan, Depth first search and linear graph algorithms, SIAM J. Comput., 1 (2), 146–160 (1972).
MathSciNet MATH Google Scholar
R. C. T. Lee and S. H. Tseng, Multikey sorting, International Journal of Policy Analysis and Information Systems, 3 (2), 1–20 (1979).
Google Scholar
E. Fix and J. L. Hodges, Jr., “Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties,” Report No. 7, USAF School of Aviation Medicine, Randolph Field, Texas (February, 1951).
Google Scholar
S. A. Dudani, The distance weighted K-nearest-neighbor rule, IEEE Trans. Systems Man Cybernet., SMC-16, 325–327 (April, 1976).
Google Scholar
C. W. Shen and R. C. T. Lee, “A Nearest Neighbor Search Technique With Short Zero-In Time,” Institute of Computer and Decision Sciences, National Tsing Hua University, Hsinchu, Taiwan, Republic of China.
Google Scholar
W. A. Burkhard and R. M. Keller, Some approaches to best-match searching, Comm. ACM, 16 (4), 230–236 (April, 1973).
MATH Google Scholar
R. A. Rivest, “Analysis of Associative Retrieval Algorithms,” Ph.D. Dissertation, Department of Computer Science, Stanford University, Stanford, California (1974).
Google Scholar
J. H. Friedman, F. Baskett, and L. J. Shustek, An algorithm for finding nearest neighbors, IEEE Trans. Comput. C-24 (10), 1000–1006 (October, 1975).
Google Scholar
J. H. Friedman, J. L. Bentley, and R. A. Finkel, An algorithm for finding best matches in logarithm expected time, ACM Trans. Math. Software, 3 (3), 209–216 (September, 1977).
MATH Google Scholar
K. Fukunaga and P. M. Narendra, A branch and bound algorithm for computing Ar-nearest neighbors, IEEE Trans. Comput., C-24 (7), 750–753 (July, 1975).
MathSciNet Google Scholar
C. T. Hsieh and R. C. T. Lee, Applications of symbolic error correcting code for nearest neighbor searching, in Proceedings of the National Computer Symposium of the Republic of China, Taipei, Taiwan, pp. 6: 7–6.14 (1976).
Google Scholar
R. C. T. Lee, Y. H. Chin, and S. C. Chang, Application of principal component analysis to multikey searching, IEEE Trans. Software Engrg., SE-2 (3), (September, 1976).
Google Scholar
H. C. Du and R. C. T. Lee, Symbolic gray code as a multikey hashing function, IEEE Trans. Pattern Analysis and Machine Intelligence, to appear.
Google Scholar
J. L. Bentley, Multidimensional binary search trees used for associative searching, Comm. ACM, 18 (9), 509–517 (September, 1975).
MathSciNet MATH Google Scholar
D. E. Knuth, Sorting and Searchings, Vol. 3, The Act of Computer Programming, Addison-Wesley, Reading, Massachusetts (1973).
Google Scholar
J. B. Rothnie and T. Lozano, Attribute based file organization in a paged memory environment, Comm. ACM, 17 (2), 63–69 (February, 1974).
Google Scholar
J. McQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the 5th Berkeley Symposium on Probability and Statistics, pp. 280–297 (1967).
Google Scholar
C. L. Chang, Finding prototypes for nearest neighbor classifiers, IEEE Trans. Comput., C-23 (11), (November, 1974).
Google Scholar
R. C. T. Lee and T. T. Deng, An improved method for finding prototypes for nearest neighbor classifiers, in Proceedings of International Computer Symposium, Taipei, Taiwan, pp. 601–609 (1977).
Google Scholar
J. L. Bentley and J. H. Friedman, Fast algorithms for constructing minimal spanning trees in coordinate space, IEEE Trans. Comput., C-27 (2), 97–105 (February, 1978).
Google Scholar
C. C. Chen and R. C. T. Lee, “Some Algorithms Employing Nearest Neighbor Searching,” Institute of Computer and Decision Sciences, National Tsing Hua University, Hsinchu, Taiwan, Republic of China.
Google Scholar
J. L. Bentley and M. I. Shamos, Divide and conquer for linear expected time, Inform. Process. Lett. 7 (2), pp. 87–91 (February, 1978).
MathSciNet MATH Google Scholar
J. L. Bentley and M. I. Shamos, Divide and conquer in multidimensional space, in Proceedings of the Eighth Annual ACM Symposium on Theory of Computing, pp. 220–230 (1976).
Google Scholar
M. O. Rabin, Probabilistic algorithms, in Algorithms and Complexity (J. F. Traub, ed.) Academic Press, New York (1976).
Google Scholar
G. Yuval, Finding nearest neighbors, Inform. Process. Lett., 5 (3), 63–65 (August, 1976).
MathSciNet MATH Google Scholar
C. W. Skinner, A heuristic approach to inductive inference fact retrieval systems, Comm. ACM, 17, 707–712 (December, 1974).
Google Scholar
R. C. T. Lee, J. R. Slagle, and C. T. Mong, Towards automatic auditing of records, IEEE Trans. Software Eng., SE-4 (5), 441–448 (September, 1978).
Google Scholar
J. Martin, Computer Data Base Organization, Prentice-Hall, Englewood Cliffs, New Jersey (1975).
Google Scholar
J. Martin, Principles of Data Base Management, Prentice-Hall, Englewood Cliffs, New Jersey (1976).
Google Scholar
H. Katzan, Computer Data Management and Data Base Technology, Van Nostrand Reinhold, New York (1975).
MATH Google Scholar
G. Wiederhold, Data Base Design, McGraw-Hill, New York (1977).
Google Scholar
S. P. Ghosh, Data Base Organization for Data Management, Academic Press, New York (1977).
MATH Google Scholar
H. Lorin, Sorting and Sort Systems, Addison-Wesley, Reading, Massachusetts (1975).
MATH Google Scholar
S. P. Ghosh, File organization, The consecutive retrieval property, Comm. ACM, 15, 802–808 (1972).
MATH Google Scholar
S. P. Ghosh, The consecutive storage of relevant records with redundancy, Comm. ACM, 18, 464–471 (1975).
MathSciNet MATH Google Scholar
J. A. Hoffer and D. G. Severance, The use of cluster analysis in physical data base design, in Proceedings of International Conference on Very Large Data Bases, Far-mingham, Massachusetts pp. 69–86 (September, 1975).
Google Scholar
R. L. Rivest, Partial-match retrieval algorithms, SIAM J. Comput., 5 (1), 19–50 (March, 1976).
MathSciNet MATH Google Scholar
J. H. Liou and S. B. Yao, Multidimensional clustering for data base organization, Inform. Systems, 2, 187–198 (1977).
MATH Google Scholar
W. C. Lin, R. C. T. Lee, and H. C. Du, Common properties of some multi-attribute file systems, IEEE Trans. Software Engrg., 5 (2), 160–174 (March, 1979).
MathSciNet Google Scholar
C. C. Chang and R. C. T. Lee, “Optimal Cartesian Product Files For Partial Match Queries and Partial Match Patterns,” Institute of Computer and Decision Sciences, National Tsing Hua University, Hsinchu, Taiwan, Republic of China.
Google Scholar
T. Yamane, Elementary Sampling Theory, Prentice-Hall, Englewood Cliffs, New Jersey (1967).
MATH Google Scholar
R. J. Freud and H. O. Hartley, A procedure for automatic data editing, J. Amer. Statist. Assoc, 62, 341–352, (June, 1967).
Google Scholar
I. P. Felligi and D. Halt, A systematic approach to automatic editing and imputation, J. Amer. Statist. Assoc, 71, 17–35 (March, 1976).
Google Scholar
J. I. Naus, T. G. Johnson, and R. Montalvo, A probabilistic method for identifying some errors and data editing, Journal Amer. Statist. Assoc, 67, 343–350 (December, 1972).
Google Scholar
D. J. Hatfield and J. Gerald, Program restructuring for virtual memory, IBM Systems J., 10 (3), 168–192 (1971).
Google Scholar
D. Ferrai, Improving locality by critical working sets, Comm. ACM, 17 (11), 614–620 (November, 1974).
Google Scholar
J. L. Baer and G. R. Sager, Dynamic improvement of locality in virtual memory systems, IEEE Trans. Software Engrg., SE-2 (1), 54–61 (March, 1976).
Google Scholar
C. C. Hsu and R. C. T. Lee, Applications of assignment technique to program restructuring, Journal of the Chinese Institute of Engineers, 2 (2), 151–160 (July, 1979).
MathSciNet Google Scholar
D. Ferrai, Computer Systems Performance Evaluation, Prentice-Hall, Englewood Cliffs, New Jersey (1978).
Google Scholar
S. Madnick and J. J. Donovan, Operating Systems, McGraw-Hill, New York (1974).
MATH Google Scholar
H. A. Taha, Operations Research, An Introduction, second edition, Macmillan, New York (1971).
MATH Google Scholar
D. T. Philips, Operations Research, Principles and Practice, Wiley, New York (1976),
Google Scholar
W. G. Cochran, Sampling Techniques, third edition, Wiley, New York (1977).
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computer and Decision Sciences, National Tsing Hua University, Hsinchu, Taiwan 300, Republic of China
R. C. T. Lee

Authors

R. C. T. Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Center for Information Research, Univesity of Florida, Gainesville, Florida, USA
Julius T. Tou

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lee, R.C.T. (1981). Clustering Analysis and Its Applications. In: Tou, J.T. (eds) Advances in Information Systems Science. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-9883-7_4

Download citation

DOI: https://doi.org/10.1007/978-1-4613-9883-7_4
Publisher Name: Springer, Boston, MA
Print ISBN: 978-1-4613-9885-1
Online ISBN: 978-1-4613-9883-7
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics