Self-Organising Maps for Hierarchical Tree View Document Clustering Using Contextual Information
In this paper we propose an effective method to cluster documents into a dynamically built taxonomy of topics, directly extracted from the documents. We take into account short contextual information within the text corpus, which is weighted by importance and used as input to a set of independently spun growing Self-Organising Maps (SOM). This work shows an increase in precision and labelling quality in the hierarchy of topics, using these indexing units. The use of the tree structure over sets of conventional two-dimensional maps creates topic hierarchies that are easy to browse and understand, in which the documents are stored based on their content similarity.
KeywordsFeature Selection Random Projection Document Cluster Text Corpus Weighting Bias
Unable to display preview. Download preview PDF.
- 1.Salton, G., Automatic text processing: the transformation, analysis, and retrieval of information by Computer, Reading, Mass.Wokingham: Addison-Wesley 1988.Google Scholar
- 2.Maarek, Y.S., Fagin, R., Ben-Shaul, I.Z., Pelleg, D., Ephemeral document clustering for web applications, IBM Research Report RJ 10186, April, 2000.Google Scholar
- 4.Honkela, T., WEBSOM Self-Organizing Maps of Document Collections, Proceedings of WSOM’97, Workshop on Self-Organizing Maps, Espoo, Finland, June 4–6, 1997.Google Scholar
- 5.Kohonen, T., Kaski, S., Lagus, K., Salojrvi, J., Paatero, V., Saarela, A., Self Organization of a Massive Document Collection. IEEE Transactions on Neural Networks, Special Issue on Neural Networks for Data Mining and Knowledge Discovery, vol. 11, n. 3, pp. 574–585, May, 2000.Google Scholar
- 8.Dittenbach, M., Merkl, D., Rauber, A., The Growing Hierarchical Self-Organizing Map, Proceedings of the International Joint Conference on Neural Networks (IJCNN 2000), vol. 6, pp. 15–19, July 24–27, 2000.Google Scholar
- 9.Freeman, R., Yin, H., Allinson, N. M., Self-Organising Maps for Tree View Based Hierarchical Document Clustering, Proceedings of the International Joint Conference on Neural Networks (IJCNN’02), vol. 2, pp. 1906–1911, Honolulu, Hawaii, 12–17 May, 2002.Google Scholar
- 12.Yin, H., ViSOM-A novel method for multivariate data projection and structure visualisation, in IEEE Transactions on Neural Networks, Vol. 13, No. 1, 2002.Google Scholar
- 13.Pullwitt, D., Der, R., Integrating Contextual Information into Text Document clustering with Self-Organizing Maps, in Advances in Self-Organising Maps, N. Allinson, H. Yin, L. Allinson, J. Slack (Eds.), Springer, pp. 54–60, 2001.Google Scholar