Advertisement

An Efficient Two-Level SOMART Document Clustering Through Dimensionality Reduction

  • Mahmoud F. Hussin
  • Mohamed S. Kamel
  • Magdy H. Nagi
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3316)

Abstract

Document Clustering is one of the popular techniques that can unveil inherent structure in the underlying data. Two successful models of unsupervised neural networks, Self-Organizing Map (SOM) and Adaptive Resonance Theory (ART) have shown promising results in this task. The high dimensionality of the data has always been a challenging problem in document clustering. It is common to overcome this problem using dimension reduction methods. In this paper, we propose a new two-level neural network based document clustering architecture that can be used for high dimensional data. Our solution is to use SOM in the first level as a dimension reduction method to produce multiple output clusters, then use ART in the second level to produce the final clusters using the reduced vector space. The experimental results of clustering documents from the RETURES corpus using our proposed architecture show an improvement in the clustering performance evaluated using the entropy and the f_measure.

Keywords

Vector Space Model Document Cluster Final Cluster Adaptive Resonance Theory Dimension Reduction Method 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Kohonen, T.: Self-organizing maps. Springer, Berlin (1995)Google Scholar
  2. 2.
    Carpenter, G.A., Grossberg, S.: A massively parallel architecture for a self-organizing neural pattern recognition machine. Computer Vision, Graphics, and Image processing 34, 54–115 (1987)CrossRefGoogle Scholar
  3. 3.
    Lampinen, J., Oja, E.: Clustering properties of hierarchial self-organizing maps. Journal of Mathematical Imaging and Vision, 261–272 (1992)Google Scholar
  4. 4.
    Koikkalainen, P.: Fast deterministic self-organizing maps. In: Proc Int’l Conf Neural Networks, Paris, France, pp. 63–68 (1995)Google Scholar
  5. 5.
    Kohonen, T., Kaski, S., Lagus, K., Saloja, J., Pattero, V., Saarela, A.: Organization of a massive document collection. IEEE Transactions on Neural Networks, Special Issue on Neural Networks for Data Mining and Knowledge Discovery 11(3), 574–585 (2000)Google Scholar
  6. 6.
    Dittenbach, M., Merkl, D., Rauber, A.: Hierarchical clustering of document archives with the growing hierarchical self-organizing map. In: Dorffner, G., Bischof, H., Hornik, K. (eds.) ICANN 2001. LNCS, vol. 2130, pp. 21–25. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  7. 7.
    Bartfai, G.: An ART-based Modular Architecture for Learning Hierarchical Clusterings. Neurocomputing 13, 31–45 (1996)CrossRefGoogle Scholar
  8. 8.
    Bartfai, G., White, R.: Adaptive Resonance Theory-based Modular Networks for Incremental Learning of Hierarchical Clusterings. Connection Science 9(1), 87–112 (1997)CrossRefGoogle Scholar
  9. 9.
    Bakus, J., Hussin, M.F., Kamel, M.: A SOM-Based Document Clustering using Phrases. In: Proc of the 9th Int.l Conf on neural information processing, Singapore, pp. 2212–2216 (November 2002)Google Scholar
  10. 10.
    Hussin, M.F., Bakus, J., Kamel, M.: Enhanced phrase-based document clustering using Self-Organizing Map (SOM) architectures. In: Book Chapter in: Neural Information Processing: Research and Development, pp. 405–424. Springer, Heidelberg (May 2004)Google Scholar
  11. 11.
    Hussin, M.F., Kamel, M.: Document clustering using hierarchical SOMART neural network. In: Proceedings of the 2003 Int.l Joint Conf on Neural Network, Portland, Oregon, USA, pp. 2238–2242 (July 2003)Google Scholar
  12. 12.
    Hussin, M.F., Kamel, M.: Integrating Phrases to Enhance HSOMART based Document Clustering. In: Proc of the 2004 Int.l Joint Conf on Neural Network, Budapest, Hungry, vol. 3, pp. 2347–2352 (July 2004)Google Scholar
  13. 13.
    Fern, X., Brodley, C.: Random Projection for High Dimensional Data Clustering: A Cluster Ensemble Approach. In: Proc. Of The Twentieth International Conference on Machine Learning (ICML-2003), Washington, DC USA (August 2003)Google Scholar
  14. 14.
    Kohonen, T., Kangas, J., Laaksonen, J.: SOM-PAK: the self-organizing map program package ver.3.1, SOM programming team of Helsinki University of Technology (April 1995)Google Scholar
  15. 15.
    Liden, L.: The ART Gallery Simulation Package ver.1.0, Dept. of cognitive and neural systems, Boston University (1995)Google Scholar
  16. 16.
    Steinbach, M., Karypis, G., Kumar, V.: A comparison of document clustering techniques, KDD.2000, Workshop on Text Mining (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Mahmoud F. Hussin
    • 1
  • Mohamed S. Kamel
    • 2
  • Magdy H. Nagi
    • 1
  1. 1.Dept. of Computer Science & Automatic ControlUniversity of AlexandriaAlexandriaEgypt
  2. 2.Dept. of Electrical and Computer EngineeringUniversity of WaterlooWaterlooCanada

Personalised recommendations