Incremental Learning and Optimization of Hierarchical Clusterings with Art-Based Modular Networks

  • G. Bartfai
  • R. White
Part of the Studies in Fuzziness and Soft Computing book series (STUDFUZZ, volume 43)


This chapter introduces HART-S, a modular neural network that can incrementally learn stable hierarchical clusterings of arbitrary sequences of input patterns by self-organisation. The network is a cascade of Adaptive Resonance Theory (ART) modules, in which each module learns to cluster the differences between the input pattern and the selected category prototype at the previous module. Input patterns are first classified into a few broad categories, and successive ART modules find increasingly specific categories until a threshold is reached, the level of which can be controlled by a global parameter called “resolution”. The network thus essentially implements a divisive (or splitting) hierarchical clustering algorithm: hence the name HART-S (for “Hierarchical ART with Splitting”). HART-S is also compared and contrasted to HART-J (for “Hierarchical ART with Joining”), another variant that was proposed earlier by the first author. The network dynamics are specified and some useful properties of both networks are given and then proven. Experiments were carried out on benchmark datasets to demonstrate the representational and learning capabilities of both networks and to compare the developed clusterings with those of two classical methods and a conceptual clustering algorithm. Two optimisation methods for the HART-S network are also introduced.


Input Pattern Incremental Learning Vigilance Level Adaptive Resonance Theory Modular Network 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Ambros-Ingerson, J., Granger, R., and Lynch, G. (1990), “Simulation of paleocortex performs hierarchical clustering,” Science, vol. 274, pp. 1344–1348.CrossRefGoogle Scholar
  2. [2]
    Andreae, P. (1996), “Froggit’s goodness measure and search strategy,” Personal communication.Google Scholar
  3. [3]
    Bartfai, G. (1994), “Hierarchical clustering with ART neural networks,” In Proceedings of the IEEE International Conference on Neural Networks, IEEE Press, vol. 2, pp. 940–944.Google Scholar
  4. [4]
    Bartfai, G. (1995), “A comparison of two ART-based neural networks for hierarchical clustering,” In N. Kasabov and G. Coghill, editors, ANNES’95, Proceedings of the Second New Zealand International Two-Stream Conference on Artificial Neural Networks and Expert Systems, IEEE Computer Society Press, pp. 83–86.CrossRefGoogle Scholar
  5. [5]
    Bartfai, G. (1996), “An ART-based modular architecture for learning hierarchical clusterings,” Neurocomputing, vol. 13, no.1, pp. 31–46.CrossRefGoogle Scholar
  6. [6]
    Bartfai, G. and White, R. (1997), “Adaptive resonance theory-based modular networks for incremental learning of hierarchical clusterings,” Connection Science, vol. 9, no. 1, pp. 87–112.CrossRefGoogle Scholar
  7. [7]
    Bartfai, G. and White, R. (1997), “A Fuzzy ART-based modular neuro-fuzzy architecture for learning hierarchical clusterings,” In Proceedings of the IEEE International Conference on Fuzzy System, pp. 1713–1718.Google Scholar
  8. [8]
    Bartfai, G. and White, R. (1998), “Learning and optimisation of hierarchical clusterings with ART-based modular networks,” In Proceedings of IEEE International Joint Conference on Neural Networks (IJCNN’98), pp. 2352–2356.Google Scholar
  9. [9]
    Blake, C, Keogh, E., and Merz, C.J. (1998), “UCI repository of machine learning databases,” Scholar
  10. [10]
    Breiman, L., Friedman, J.H., Olshen, R.A., and Sonte, CJ. (1984), Classification and Regression Trees, Wadsworth and Brooks, Monterey, CA.MATHGoogle Scholar
  11. [11]
    Carpenter, G.A. and Grossberg, S. (1987), “ART2: Self-organizing of stable category recognition codes for analog input patterns,” Applied Optics, vol. 26, no. 23, pp. 4919–4930.CrossRefGoogle Scholar
  12. [12]
    Carpenter, G.A. and Grossberg, S. (1987), “A massively parallel architecture for a self-organizing neural pattern recognition machine,” Computer Vision, Graphics, and Image Processing, vol. 37, pp 54–115.MATHCrossRefGoogle Scholar
  13. [13]
    Carpenter, G.A. and Grossberg, S. (1990), “ART3: Hierarchical search using chemical transmitters in self-organizing pattern recognition architectures,” Neural Networks, vol. 3, pp. 129–152.CrossRefGoogle Scholar
  14. [14]
    Carpenter, G.A., Grossberg, S., and Reynolds, J.H. (1991), “ARTMAP: Supervised real-time learning and classification of nonstationary data by a self-organizing neural network,” Neural Networks, vol. 4, pp. 565–588.CrossRefGoogle Scholar
  15. [15]
    Carpenter, G.A., Grossberg, S., and Rosen, D.B. (1991), “Fuzzy ART: Fast stable learning and categorization of analog patterns by an adaptive resonance system,” Neural Networks, vol. 4, pp. 759–771.CrossRefGoogle Scholar
  16. [16]
    Carpenter, G.A. and Grossberg, S. (1988), “The ART of adap¬tive pattern recognition by a self-organizing neural network,” IEEE Computer, vol. 21, no. 3, pp. 77–88.CrossRefGoogle Scholar
  17. [17]
    Caudell, T.R, Smith, S.D.G., Escobedo, R., and Anderson, M. (1994), “NIRS: Large scale ART-1 neural architectures for engineering design retrieval,” Neural Networks, vol. 7, no. 9, pp. 1339–1350.CrossRefGoogle Scholar
  18. [18]
    Dawkins, B.R, Andreae, P.M., and O’Connor, RM. (1994), “Analysis of Olympic heptathlon data,” Journal of the American Statistical Association, vol. 89, no. 427, pp. 1100–1106.CrossRefGoogle Scholar
  19. [19]
    Fisher, D.H. (1987), “Knowledge acquisition via incremental conceptual clustering,” Machine Learning, vol. 2, no. 2, pp. 139–172.Google Scholar
  20. [20]
    Fisher, D. (1996), “Iterative optimization and simplification of hi¬erarchical clusterings,” Journal of Artificial Intelligence Research, vol. 4, pp. 147–179.MATHGoogle Scholar
  21. [21]
    Gennari, J.H., Langley, P., and Fisher, D. (1989), “Models of incremental concept formation,” Artificial Intelligence, vol. 40, pp. 11–61.CrossRefGoogle Scholar
  22. [22]
    Gordon, A.D. (1981), Classification: Methods for the Exploratory Analysis of Multivariate Data. London: Chapman and Hall.MATHGoogle Scholar
  23. [23]
    Happel, B. and Murre, J. (1994), “Design and evolution of modular neural network architectures,” Neural Networks, vol. 7, no. 6/7, pp. 985–1004.CrossRefGoogle Scholar
  24. [24]
    Hartigan, J.A. (1975), Clustering Algorithms. John Wiley & Sons, Inc.MATHGoogle Scholar
  25. [25]
    Hecht-Nielsen, R. (1987), “Counterpropagation networks,” Applied Optics, vol. 26, no. 23, pp. 4979–4984.CrossRefGoogle Scholar
  26. [26]
    Hrycej, T. (1992), Modular Learning in Neural Networks; A Modularized Approach to Neural Network Classification, Sixth-Generation Computer Technology Series, John Wiley & Sons, Inc.MATHGoogle Scholar
  27. [27]
    Ishihara, S., Ishihara, K., Nagamachi, M., and Matsubara, Y. (1993), “ART 1.5-SSS for kansei engineering expert system,” In Proceedings of International Joint Conference on Neural Networks, pp. 2512–2515.Google Scholar
  28. [28]
    Ishihara, S., Ishihara, K., Nagamachi, M., and Matsubara, Y. (1995), “arboart: ART based hierarchical clustering and its application to questionnaire data analysis,” In Proceedings of the IEEE International Conference on Neural Networks, IEEE Press, vol. 1, pp. 532–537.CrossRefGoogle Scholar
  29. [29]
    Jordan, M.I. and Jacobs, R.A. (1992), “Hierarchies of adaptive experts,” In J. E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems 4, Morgan Kaufmann, San Mateo, CA, pp. 985–992.Google Scholar
  30. [30]
    Jordan, M.I. and Jacobs, R.A. (1993), “Hierarchical mixtures of experts and the EM algorithm,” Technical Report 9203, MIT Computational Cognitive Science, MIT, Cambridge, MA.Google Scholar
  31. [31]
    Kohonen, T. (1982), “Self-organized formation of topologically correct feature maps,” Biological Cybernetics, vol. 43, pp. 59–69.MathSciNetMATHCrossRefGoogle Scholar
  32. [32]
    Lampinen, J. and Oja, E. (1992), “Clustering properties of hierarchical self-organizing maps,” Journal of Mathematical Imaging and Vision, vol. 2, pp. 261–272.MATHCrossRefGoogle Scholar
  33. [33]
    LeCun, Y, Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., and Jackel, L.D. (1990), “Back-propagation applied to handwritten zip code recognition,” Neural Computation, vol. 1, pp. 541–551.CrossRefGoogle Scholar
  34. [34]
    Li, T., Fang, L., and Li, K.Q-Q. (1993), “Hierarchical classification and vector quantization with neural trees,” Neurocomputing, vol. 5, pp. 119–139.CrossRefGoogle Scholar
  35. [35]
    Miyata, Y. (1991), PlaNet, A Tool for Constructing, Running, and Looking into a PDP Network.Google Scholar
  36. [36]
    Moody J. and Darken, CJ. (1989), “Fast learning in networks of locally-tuned processing units,” Neural Computation, vol. 1, pp. 281–294.CrossRefGoogle Scholar
  37. [37]
    Moore, B. (1989), “ART1 and pattern clustering,” In D. Touretzky, G. Hinton, and T. Sejnowski, editors, Proceedings of the 1988 Con-nectionist Models Summer School, Morgan Kaufmann, San Mateo, CA, pp. 174–185.Google Scholar
  38. [38]
    Oja, E. (1989), “Neural networks, principal components, and sub-spaces,” International Journal of Neural Systems, vol. 1, pp. 61–68.MathSciNetCrossRefGoogle Scholar
  39. [39]
    Rumelhart, D.E., McClelland, J.L., and the PDP Research Group (1986), Parallel Distributed Processing; Explorations in the Micros tructure of Cognition, the MIT Press, vol. 1: Foundations, chapter 5, pp. 151–193.Google Scholar
  40. [40]
    Shepherd, G.M. (1974), The Synaptic Organization of the Brain. Oxford University Press, New York.Google Scholar
  41. [41]
    Soon, H. and Tan, A. (1993), “Concept hierarchy network for inheritance systems: Concept formation, property inheritance and conflict resolution,” In Proceedings of 15th Annual Conference of Cognitive Science Society, Lawrence Erlbaum Associates, pp. 941–946.Google Scholar
  42. [42]
    Van Essen, D.C., Anderson, C.H., and Felleman, D J. (1992), “Information processing in the primate visual system: An integrated systems perspective,” Science, vol. 255, no. 5043, pp. 419–423.CrossRefGoogle Scholar
  43. [43]
    Wallace, C.S. and Boulton, D.M. (1968), “An information measure for classification,” The Computer Journal, vol. 11, no. 2, pp. 185.MATHCrossRefGoogle Scholar
  44. [44]
    Zadeh, L.A. (1965), “Fuzzy sets,” Information and Control, vol. 8, pp. 338–353.MathSciNetMATHCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2000

Authors and Affiliations

  • G. Bartfai
    • 1
  • R. White
    • 1
  1. 1.School of Mathematical and Computing SciencesVictoria University of WellingtonNew Zealand

Personalised recommendations