Skip to main content

Clustering Analysis and Its Applications

  • Chapter
Advances in Information Systems Science

Abstract

Clustering analysis(1–4) is a newly developed computer-oriented data analysis technique. It is a product of many research fields: statistics, computer science, operations research, and pattern recognition. Because of the diverse backgrounds of researchers, clustering analysis has many different names. In biology, clustering analysis is called “taxonomy”.(5,6) In pattern recognition(7–15) it is called “unsupervised learning.” Perhaps the most confusing name of all, the term “classification” sometimes also denotes clustering analysis. Since classification may denote discriminant analysis, which is totally different from clustering analysis, it is perhaps important to distinguish these two terms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. C. Tryon and D. E. Bailey, Cluster Analysis, McGraw-Hill, New York (1970).

    Google Scholar 

  2. M. R. Anderberg, Cluster Analysis for Applications, Academic Press, New York (1973).

    MATH  Google Scholar 

  3. J. A. Hartigan, Clustering Algorithms, Wiley, New York (1975).

    MATH  Google Scholar 

  4. J. V. Ryzin, Classification and Clustering, Academic Press, New York (1977).

    Google Scholar 

  5. R. R. Sokal and P. H. A. Sneath, Principles of Numerical Taxonomy, W. H. Freeman, San Francisco (1963).

    Google Scholar 

  6. N. Jardine and R. Sibson, Mathematical Taxonomy, Wiley, New York (1971).

    MATH  Google Scholar 

  7. R. Duda and P. Hart, Pattern Recognition and Scene Analysis, Wiley, New York (1973).

    Google Scholar 

  8. K. S. Fu, Syntactical Methods in Pattern Recognition, Academic Press, New York (1974).

    Google Scholar 

  9. W. S. Meisel, Computer-Oriented Approaches to Pattern Recognition, Academic Press, New York (1972).

    MATH  Google Scholar 

  10. J. T. Tou and R. C. Gonzalez, Pattern Recognition, Addison-Wesley, Reading, Massachusetts (1974).

    MATH  Google Scholar 

  11. U.K. Fukunaga, Introduction to Statistical Pattern Recognition, Academic Press, New York (1972).

    Google Scholar 

  12. C. H. Chen, Pattern Recognition and Artificial Intelligence, Academic Press, New York (1976).

    Google Scholar 

  13. Y. T. Chien, Interactive Pattern Recognition, Marcel Dekker, New York (1978).

    Google Scholar 

  14. E. A. Patrick, Fundamentals of Pattern Recognition, Prentice-Hall, Englewood Cliffs, New Jersey (1972).

    MATH  Google Scholar 

  15. K. S. Fu, Sequential Methods in Pattern Recognition, Academic Press, New York (1969).

    Google Scholar 

  16. M. O. Dayhoff, Computer analysis of protein evolution, Scientific American, 232(7), 69–85 (July, 1969).

    Google Scholar 

  17. A. C. Shaw, A formal picture description scheme as a basis for picture processing systems, Inform, and Control, 14, 9–53 (1969).

    MATH  Google Scholar 

  18. K. S. Fu and S. Y. Lu, A clustering procedure for syntactic patterns, IEEE Trans. on Systems Man Cybernet SMC--17 (10) 737–742 (1977).

    MathSciNet  Google Scholar 

  19. K. S. Fu and S. Y. Lu, A sentence-to-sentence clustering procedure for pattern analysis, IEEE Trans. on Systems Man Cybernet. SMC-18 (5) 381–389 (May, 1978).

    MathSciNet  Google Scholar 

  20. S. Y. Lu and K. S. Fu, Stochastic error-correcting syntactic analysis for reorganization of noisy patterns, IEEE Trans. Comput., C-26 (12) 1268–1276 (1977).

    MathSciNet  Google Scholar 

  21. S. C. Chang and R. C. T. Lee, Clustering of syntactic Patterns without parsing, in Proceedings of International Computer Symposium, Taipei, Taiwan, pp. 626–643 (1978).

    Google Scholar 

  22. C. L. Chang and R. C. T. Lee, Symbolic Logic and Mechanical Theorem Proving, Academic Press, New York (1973).

    MATH  Google Scholar 

  23. C. C. Gotlieb and S. Kumar, Semantic clustering of index terms,J. Assoc. Comput. Mach. 15 (4) 493–513 (October, 1968).

    Google Scholar 

  24. D. Wishart, Mode analysis, a generalization of nearest neighbor which reduces chaining effects, in Numerical Taxonomy (A. J. Coles, ed.), Academic Press, New York, pp. 282–308 (1969).

    Google Scholar 

  25. E. C. Prim, Shortest connection network and some generalization, Bell System Tech. J., 36, 1389–1401 (November, 1957).

    Google Scholar 

  26. C.T. Zahn, Graph-theoretical methods for detecting and describing gestalt clusters, IEEE Trans. Comput. C-20 (1) 68–86 (January, 1971).

    Google Scholar 

  27. R. A. Fisher, The use of multiple measurements in taxonomic problem, Annals of Eugenics, 7, 179–188 (1936).

    Google Scholar 

  28. G. Salton, Automatic Information Storage and Retrieval, McGraw-Hill, New York (1968).

    Google Scholar 

  29. A. V. Aho, J. E. Hopcroft, and J. D. Ullman, The Design and Analysis of Computer Algorithms, Addison-Wesley, Reading, Massachusetts (1975).

    Google Scholar 

  30. R. E. Bonner, On some clustering techniques, IBM J. Res. Develop., 8 (1) 22–32 (1964).

    MATH  Google Scholar 

  31. J. G. Augustin and J. Minker, An analysis of some graph theoretical cluster techniques,J. Assoc. Comput. Mach. 17 (4), 571–588 (1970).

    Google Scholar 

  32. R. E. Osteen and J. T. Tou, A clique-detection algorithms based on neighborhoods in graphs, Internat. J. Comput. Inform. Sci., 2 (4), 257–268 (1973).

    MathSciNet  MATH  Google Scholar 

  33. J. R. Slagle, C. L. Chang, and S. Heller, A clustering and data-reorganization algorithm, IEEE Trans. Systems Man Cybernet, SMC-15 (1), 121–128 (January, 1975).

    Google Scholar 

  34. J. R. Slagle, C. L. Chang, and R. C. T. Lee, Experiments with some clustering analysis algorithms, Pattern Recognition, 6, 181–187 (1974).

    Google Scholar 

  35. J. A. Hartigan, Direct clustering of data matrices, Journal Amer. Statist. Assoc, 67, 123–129 (March, 1970).

    Google Scholar 

  36. W. T. McCormick, P. J. Schweitzer, and T. W. White, Problem decomposition and data reorganization by a clustering technique, Oper. Res., 20 (5), 993–1009 (September-October, 1972).

    MATH  Google Scholar 

  37. J. K. Lenstra, Clustering a data array and the traveling salesman problem, Oper. Res., 22 (2), 413–414 (March-April, 1974).

    MathSciNet  MATH  Google Scholar 

  38. S. B. Deutsch and J. J. Martin, An ordering algorithm for analysis of data arrays, Oper. Res., 19 (6), 1350–1362 (October, 1971).

    MATH  Google Scholar 

  39. M. G. Kendall, A Course in Multivariate Analysis, Hafner, New York (1968).

    Google Scholar 

  40. D. F. Morrison, Multivariate Statistical Methods, McGraw-Hill, New York (1967).

    MATH  Google Scholar 

  41. C. R. Rao, Linear Statistical Inference audits Applications, Wiley, New York (1973).

    Google Scholar 

  42. T. Y. Young and T. W. Calvert, Classification, Estimation and Pattern Recognition, American Elsevier, New York (1974).

    MATH  Google Scholar 

  43. W. M. Cooley and P. R. Lohnes, Multivariate Data Analysis, Wiley, New York (1974).

    Google Scholar 

  44. A. J. Chen and H. T. Wang, The display and analysis of Lansdat multi-spectral data over Taiwan, in Proceedings of International Computer Symposium, Taipei, Taiwan, pp. 1083–1101 (1978).

    Google Scholar 

  45. C. L. Chang and R. C. T. Lee, A heuristic relaxation method for nonlinear mapping in clustering analysis, IEEE Trans. Systems Man Cybernet., SMC-3 (3), 197–200 (March, 1978).

    Google Scholar 

  46. J. R. Slagle and R. C. T. Lee, Application of game tree searching to sequential pattern recognition, Comm. ACM, 14 (2), 103–110 (February, 1971).

    MATH  Google Scholar 

  47. R. C. T. Lee, J. R. Slagle, and H. Blum, A triangulation method for the sequential mapping of points from TV-space to 2-space, IEEE Trans. Comput., C-26 (3), 310–313 (March, 1977).

    Google Scholar 

  48. J. R. Slagle, Artificial Intelligence: A Heuristic Programming Approach, McGraw-Hill, New York (1971).

    Google Scholar 

  49. N. J. Nilsson, Problem Solving Methods in Artificial Intelligence, McGraw-Hill, New York (1971).

    Google Scholar 

  50. R. N. Shepard, The analysis of proximities: multidimensional scaling with an unknown distance function I, Psychometrika, 27, 125–140 (1962).

    MathSciNet  MATH  Google Scholar 

  51. R. N. Shepard, The analysis of proximities: multidimensional scaling with an unknown distance function II, Psychometrika, 27, 219–246 (1962).

    MathSciNet  Google Scholar 

  52. R. N. Shepard and J. D. Carroll, Parametric representation of nonlinear mapping data structures, in Proceedings of International Symposium on Multivariate Analysis, (P. R. Krishnaiah, ed.) Academic Press, New York (1966).

    Google Scholar 

  53. J. B. Kruskal, Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis, Psychometrika, 29, 1–27 (March, 1964).

    MathSciNet  MATH  Google Scholar 

  54. J. B. Kruskal, Nonmetric multidimensional scaling: A numerical method, Psychometrika, 29, 115–129 (June, 1964).

    MathSciNet  MATH  Google Scholar 

  55. J. W. Sammon, Jr., A nonlinear mapping for data structure analysis, IEEE Trans. Comput., C-18 (5), 401–409 (May, 1969).

    Google Scholar 

  56. R. S. Bennett, The intrinsic dimensionality of signal collections, IEEE Trans. Inform. Theory, IT-15 (9), 517–525 (September, 1969).

    Google Scholar 

  57. K. Fukunaga and D. R. Olsen, An algorithm for finding intrinsic dimensionality of data, IEEE Trans. Comput., C-20 (2), 176–183 (February, 1971).

    Google Scholar 

  58. G. V. Trunk, Statistical estimation of the intrinsic dimensionality of data collections, Inform, and Control, 12 508–525 (1968).

    Google Scholar 

  59. C. K. Chen and H. C. Andrews, Nonlinear intrinsic dimensionality computations, IEEE Trans. Comput., C-23 (2), 178–184 (February, 1974).

    Google Scholar 

  60. D. H. Schwartzman and J. J. Vidal, An algorithm for determining the topological dimensionality of point clusters, IEEE Trans. Comput., C-24 (12), 1175–1182 (December, 1975).

    Google Scholar 

  61. P. E. Green and V. R. Rao, Applied Multidimensional Scaling, Holt, Reinhart and Winston, New York (1972).

    Google Scholar 

  62. P. E. Green and D. S. Tull, Research for Marketing Decision, Prentice-Hall, Englewood Cliffs, New Jersey (1975).

    Google Scholar 

  63. R. E. Frank and P. E. Green, Numerical taxonomy in marketing analysis: A review article, Journal of Marketing Research, 5, 83–98 (February, 1968).

    Google Scholar 

  64. R. C. T. Lee and C. L. Chang, Applications of minimal spanning trees to information storage, in Proceedings of International Symposium on Computers and Chinese Input/ Output Systems, pp. 1245–1256 (August, 1973).

    Google Scholar 

  65. W. M. Fitch, Toward defining the course of evolution in minimum change for a specific tree topology, Systematic Zoology, 20, 406–416 (1971).

    Google Scholar 

  66. A. N. C. Kang, R. C. T. Lee, C. L. Chang, and S. K. Chang, Storage reduction through minimal spanning trees and spanning forests, IEEE Trans. Comput., C-26 (5), 425–434 (May, 1977).

    MathSciNet  Google Scholar 

  67. A. N. C. Kang and A. Ault, Some properties of a centroid of a free tree, Inform. Process. Lett., 3, 18–20 (Sept., 1975).

    MathSciNet  Google Scholar 

  68. R. Tarjan, Depth first search and linear graph algorithms, SIAM J. Comput., 1 (2), 146–160 (1972).

    MathSciNet  MATH  Google Scholar 

  69. R. C. T. Lee and S. H. Tseng, Multikey sorting, International Journal of Policy Analysis and Information Systems, 3 (2), 1–20 (1979).

    Google Scholar 

  70. E. Fix and J. L. Hodges, Jr., “Discriminatory Analysis: Nonparametric Discrimination: Consistency Properties,” Report No. 7, USAF School of Aviation Medicine, Randolph Field, Texas (February, 1951).

    Google Scholar 

  71. S. A. Dudani, The distance weighted K-nearest-neighbor rule, IEEE Trans. Systems Man Cybernet., SMC-16, 325–327 (April, 1976).

    Google Scholar 

  72. C. W. Shen and R. C. T. Lee, “A Nearest Neighbor Search Technique With Short Zero-In Time,” Institute of Computer and Decision Sciences, National Tsing Hua University, Hsinchu, Taiwan, Republic of China.

    Google Scholar 

  73. W. A. Burkhard and R. M. Keller, Some approaches to best-match searching, Comm. ACM, 16 (4), 230–236 (April, 1973).

    MATH  Google Scholar 

  74. R. A. Rivest, “Analysis of Associative Retrieval Algorithms,” Ph.D. Dissertation, Department of Computer Science, Stanford University, Stanford, California (1974).

    Google Scholar 

  75. J. H. Friedman, F. Baskett, and L. J. Shustek, An algorithm for finding nearest neighbors, IEEE Trans. Comput. C-24 (10), 1000–1006 (October, 1975).

    Google Scholar 

  76. J. H. Friedman, J. L. Bentley, and R. A. Finkel, An algorithm for finding best matches in logarithm expected time, ACM Trans. Math. Software, 3 (3), 209–216 (September, 1977).

    MATH  Google Scholar 

  77. K. Fukunaga and P. M. Narendra, A branch and bound algorithm for computing Ar-nearest neighbors, IEEE Trans. Comput., C-24 (7), 750–753 (July, 1975).

    MathSciNet  Google Scholar 

  78. C. T. Hsieh and R. C. T. Lee, Applications of symbolic error correcting code for nearest neighbor searching, in Proceedings of the National Computer Symposium of the Republic of China, Taipei, Taiwan, pp. 6: 7–6.14 (1976).

    Google Scholar 

  79. R. C. T. Lee, Y. H. Chin, and S. C. Chang, Application of principal component analysis to multikey searching, IEEE Trans. Software Engrg., SE-2 (3), (September, 1976).

    Google Scholar 

  80. H. C. Du and R. C. T. Lee, Symbolic gray code as a multikey hashing function, IEEE Trans. Pattern Analysis and Machine Intelligence, to appear.

    Google Scholar 

  81. J. L. Bentley, Multidimensional binary search trees used for associative searching, Comm. ACM, 18 (9), 509–517 (September, 1975).

    MathSciNet  MATH  Google Scholar 

  82. D. E. Knuth, Sorting and Searchings, Vol. 3, The Act of Computer Programming, Addison-Wesley, Reading, Massachusetts (1973).

    Google Scholar 

  83. J. B. Rothnie and T. Lozano, Attribute based file organization in a paged memory environment, Comm. ACM, 17 (2), 63–69 (February, 1974).

    Google Scholar 

  84. J. McQueen, Some methods for classification and analysis of multivariate observations, in Proceedings of the 5th Berkeley Symposium on Probability and Statistics, pp. 280–297 (1967).

    Google Scholar 

  85. C. L. Chang, Finding prototypes for nearest neighbor classifiers, IEEE Trans. Comput., C-23 (11), (November, 1974).

    Google Scholar 

  86. R. C. T. Lee and T. T. Deng, An improved method for finding prototypes for nearest neighbor classifiers, in Proceedings of International Computer Symposium, Taipei, Taiwan, pp. 601–609 (1977).

    Google Scholar 

  87. J. L. Bentley and J. H. Friedman, Fast algorithms for constructing minimal spanning trees in coordinate space, IEEE Trans. Comput., C-27 (2), 97–105 (February, 1978).

    Google Scholar 

  88. C. C. Chen and R. C. T. Lee, “Some Algorithms Employing Nearest Neighbor Searching,” Institute of Computer and Decision Sciences, National Tsing Hua University, Hsinchu, Taiwan, Republic of China.

    Google Scholar 

  89. J. L. Bentley and M. I. Shamos, Divide and conquer for linear expected time, Inform. Process. Lett. 7 (2), pp. 87–91 (February, 1978).

    MathSciNet  MATH  Google Scholar 

  90. J. L. Bentley and M. I. Shamos, Divide and conquer in multidimensional space, in Proceedings of the Eighth Annual ACM Symposium on Theory of Computing, pp. 220–230 (1976).

    Google Scholar 

  91. M. O. Rabin, Probabilistic algorithms, in Algorithms and Complexity (J. F. Traub, ed.) Academic Press, New York (1976).

    Google Scholar 

  92. G. Yuval, Finding nearest neighbors, Inform. Process. Lett., 5 (3), 63–65 (August, 1976).

    MathSciNet  MATH  Google Scholar 

  93. C. W. Skinner, A heuristic approach to inductive inference fact retrieval systems, Comm. ACM, 17, 707–712 (December, 1974).

    Google Scholar 

  94. R. C. T. Lee, J. R. Slagle, and C. T. Mong, Towards automatic auditing of records, IEEE Trans. Software Eng., SE-4 (5), 441–448 (September, 1978).

    Google Scholar 

  95. J. Martin, Computer Data Base Organization, Prentice-Hall, Englewood Cliffs, New Jersey (1975).

    Google Scholar 

  96. J. Martin, Principles of Data Base Management, Prentice-Hall, Englewood Cliffs, New Jersey (1976).

    Google Scholar 

  97. H. Katzan, Computer Data Management and Data Base Technology, Van Nostrand Reinhold, New York (1975).

    MATH  Google Scholar 

  98. G. Wiederhold, Data Base Design, McGraw-Hill, New York (1977).

    Google Scholar 

  99. S. P. Ghosh, Data Base Organization for Data Management, Academic Press, New York (1977).

    MATH  Google Scholar 

  100. H. Lorin, Sorting and Sort Systems, Addison-Wesley, Reading, Massachusetts (1975).

    MATH  Google Scholar 

  101. S. P. Ghosh, File organization, The consecutive retrieval property, Comm. ACM, 15, 802–808 (1972).

    MATH  Google Scholar 

  102. S. P. Ghosh, The consecutive storage of relevant records with redundancy, Comm. ACM, 18, 464–471 (1975).

    MathSciNet  MATH  Google Scholar 

  103. J. A. Hoffer and D. G. Severance, The use of cluster analysis in physical data base design, in Proceedings of International Conference on Very Large Data Bases, Far-mingham, Massachusetts pp. 69–86 (September, 1975).

    Google Scholar 

  104. R. L. Rivest, Partial-match retrieval algorithms, SIAM J. Comput., 5 (1), 19–50 (March, 1976).

    MathSciNet  MATH  Google Scholar 

  105. J. H. Liou and S. B. Yao, Multidimensional clustering for data base organization, Inform. Systems, 2, 187–198 (1977).

    MATH  Google Scholar 

  106. W. C. Lin, R. C. T. Lee, and H. C. Du, Common properties of some multi-attribute file systems, IEEE Trans. Software Engrg., 5 (2), 160–174 (March, 1979).

    MathSciNet  Google Scholar 

  107. C. C. Chang and R. C. T. Lee, “Optimal Cartesian Product Files For Partial Match Queries and Partial Match Patterns,” Institute of Computer and Decision Sciences, National Tsing Hua University, Hsinchu, Taiwan, Republic of China.

    Google Scholar 

  108. T. Yamane, Elementary Sampling Theory, Prentice-Hall, Englewood Cliffs, New Jersey (1967).

    MATH  Google Scholar 

  109. R. J. Freud and H. O. Hartley, A procedure for automatic data editing, J. Amer. Statist. Assoc, 62, 341–352, (June, 1967).

    Google Scholar 

  110. I. P. Felligi and D. Halt, A systematic approach to automatic editing and imputation, J. Amer. Statist. Assoc, 71, 17–35 (March, 1976).

    Google Scholar 

  111. J. I. Naus, T. G. Johnson, and R. Montalvo, A probabilistic method for identifying some errors and data editing, Journal Amer. Statist. Assoc, 67, 343–350 (December, 1972).

    Google Scholar 

  112. D. J. Hatfield and J. Gerald, Program restructuring for virtual memory, IBM Systems J., 10 (3), 168–192 (1971).

    Google Scholar 

  113. D. Ferrai, Improving locality by critical working sets, Comm. ACM, 17 (11), 614–620 (November, 1974).

    Google Scholar 

  114. J. L. Baer and G. R. Sager, Dynamic improvement of locality in virtual memory systems, IEEE Trans. Software Engrg., SE-2 (1), 54–61 (March, 1976).

    Google Scholar 

  115. C. C. Hsu and R. C. T. Lee, Applications of assignment technique to program restructuring, Journal of the Chinese Institute of Engineers, 2 (2), 151–160 (July, 1979).

    MathSciNet  Google Scholar 

  116. D. Ferrai, Computer Systems Performance Evaluation, Prentice-Hall, Englewood Cliffs, New Jersey (1978).

    Google Scholar 

  117. S. Madnick and J. J. Donovan, Operating Systems, McGraw-Hill, New York (1974).

    MATH  Google Scholar 

  118. H. A. Taha, Operations Research, An Introduction, second edition, Macmillan, New York (1971).

    MATH  Google Scholar 

  119. D. T. Philips, Operations Research, Principles and Practice, Wiley, New York (1976),

    Google Scholar 

  120. W. G. Cochran, Sampling Techniques, third edition, Wiley, New York (1977).

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1981 Plenum Press, New York

About this chapter

Cite this chapter

Lee, R.C.T. (1981). Clustering Analysis and Its Applications. In: Tou, J.T. (eds) Advances in Information Systems Science. Springer, Boston, MA. https://doi.org/10.1007/978-1-4613-9883-7_4

Download citation

  • DOI: https://doi.org/10.1007/978-1-4613-9883-7_4

  • Publisher Name: Springer, Boston, MA

  • Print ISBN: 978-1-4613-9885-1

  • Online ISBN: 978-1-4613-9883-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics