Abstract
The article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal Component Analysis. We introduce hierarchical organization of the categorized articles changing the granularity of SOM network. The categorization method has been used in implementation of the system that clusters results of keyword-based search in Polish Wikipedia.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jolliffe, I.: Principal component analysis. Springer, Heidelberg (2002)
Kohonen, T., Somervuo, P.: Self-organizing maps of symbol strings. Neurocomputing 21, 19–30 (1998)
Hayes, P., Carbonell, J.: Natural Language Understanding. Encyclopedia of Artificial Intelligence (1987)
Allen, J.: Natural language understanding. Benjamin-Cummings Publishing Co., Inc., Redwood City (1995)
Russell, S., Norvig, P., Canny, J., Malik, J., Edwards, D.: Artificial intelligence: a modern approach. Prentice-Hall, Englewood Cliffs (1995)
Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval. Addison-Wesley, Reading (1999)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34, 1–47 (2002)
Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)
Kohonen, T.: The self-organizing map. Proceedings of the IEEE 78, 1464–1480 (1990)
Gersho, A., Gray, R.M.: Vector quantization and signal compression. Kluwer Academic Pub., Dordrecht (1992)
Blachnik, M., Duch, W., Wieczorek, T.: Selection of prototype rules: Context searching via clustering. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 573–582. Springer, Heidelberg (2006)
Duch, W., Naud, A.: Multidimensional scaling and Kohonen’s self-organizing maps. In: Proceedings of the Second Conference of Neural Networks and their Applications, vol. 1, pp. 138–143
Merkl, D.: Text classification with self-organizing maps: Some lessons learned. Neurocomputing 21, 61–77 (1998)
Honkela, T., Kaski, S., Lagus, K., Kohonen, T.: Websom – self-organizing maps of document collections. In: Proceedings of WSOM, vol. 97, pp. 4–6. Citeseer (1997)
Berkhin, P.: A survey of clustering data mining techniques. Grouping Multidimensional Data, 25–71 (2006)
Jian, F.: Web text mining based on DBSCAN clustering algorithm. Science Information 1 (2007)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of 2nd International Conference on Knowledge Discovery, pp. 226–231 (1996)
Rauber, A., Merkl, D., Dittenbach, M.: The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data. IEEE Transactions on Neural Networks 13, 1331–1341 (2002)
Koikkalainen, P., Oja, E.: Self-organizing hierarchical feature maps. In: 1990 IJCNN International Joint Conference on Neural Networks, pp. 279–284 (1990)
Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and its Applications, 3rd edn. Springer, Heidelberg (2008)
Bennett, C., Li, M., Ma, B.: Chain letters and evolutionary histories. Scientific American 288, 76–81 (2003)
Miller, G.A., Beckitch, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An On-line Lexical Database. Cognitive Science Laboratory. Princeton University Press, Princeton (1993)
Voorhees, E.: Using WordNet to disambiguate word senses for text retrieval. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 171–180. ACM, New York (1993)
Szymański, J., Mizgier, A., Szopiński, M., Lubomski, P.: Ujednoznacznianie słow przy użyciu słownika WordNet. In: Wydawnictwo Naukowe PG TI 2008, vol. 18, pp. 89–195 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Szymański, J. (2011). Self–Organizing Map Representation for Clustering Wikipedia Search Results. In: Nguyen, N.T., Kim, CG., Janiak, A. (eds) Intelligent Information and Database Systems. ACIIDS 2011. Lecture Notes in Computer Science(), vol 6592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20042-7_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-20042-7_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20041-0
Online ISBN: 978-3-642-20042-7
eBook Packages: Computer ScienceComputer Science (R0)