Skip to main content

Self–Organizing Map Representation for Clustering Wikipedia Search Results

  • Conference paper
Intelligent Information and Database Systems (ACIIDS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6592))

Included in the following conference series:

Abstract

The article presents an approach to automated organization of textual data. The experiments have been performed on selected sub-set of Wikipedia. The Vector Space Model representation based on terms has been used to build groups of similar articles extracted from Kohonen Self-Organizing Maps with DBSCAN clustering. To warrant efficiency of the data processing, we performed linear dimensionality reduction of raw data using Principal Component Analysis. We introduce hierarchical organization of the categorized articles changing the granularity of SOM network. The categorization method has been used in implementation of the system that clusters results of keyword-based search in Polish Wikipedia.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jolliffe, I.: Principal component analysis. Springer, Heidelberg (2002)

    MATH  Google Scholar 

  2. Kohonen, T., Somervuo, P.: Self-organizing maps of symbol strings. Neurocomputing 21, 19–30 (1998)

    Article  MATH  Google Scholar 

  3. Hayes, P., Carbonell, J.: Natural Language Understanding. Encyclopedia of Artificial Intelligence (1987)

    Google Scholar 

  4. Allen, J.: Natural language understanding. Benjamin-Cummings Publishing Co., Inc., Redwood City (1995)

    MATH  Google Scholar 

  5. Russell, S., Norvig, P., Canny, J., Malik, J., Edwards, D.: Artificial intelligence: a modern approach. Prentice-Hall, Englewood Cliffs (1995)

    Google Scholar 

  6. Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern information retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  7. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys (CSUR) 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  8. Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Communications of the ACM 18, 613–620 (1975)

    Article  MATH  Google Scholar 

  9. Kohonen, T.: The self-organizing map. Proceedings of the IEEE 78, 1464–1480 (1990)

    Article  Google Scholar 

  10. Gersho, A., Gray, R.M.: Vector quantization and signal compression. Kluwer Academic Pub., Dordrecht (1992)

    Book  MATH  Google Scholar 

  11. Blachnik, M., Duch, W., Wieczorek, T.: Selection of prototype rules: Context searching via clustering. In: Rutkowski, L., Tadeusiewicz, R., Zadeh, L.A., Żurada, J.M. (eds.) ICAISC 2006. LNCS (LNAI), vol. 4029, pp. 573–582. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  12. Duch, W., Naud, A.: Multidimensional scaling and Kohonen’s self-organizing maps. In: Proceedings of the Second Conference of Neural Networks and their Applications, vol. 1, pp. 138–143

    Google Scholar 

  13. Merkl, D.: Text classification with self-organizing maps: Some lessons learned. Neurocomputing 21, 61–77 (1998)

    Article  Google Scholar 

  14. Honkela, T., Kaski, S., Lagus, K., Kohonen, T.: Websom – self-organizing maps of document collections. In: Proceedings of WSOM, vol. 97, pp. 4–6. Citeseer (1997)

    Google Scholar 

  15. Berkhin, P.: A survey of clustering data mining techniques. Grouping Multidimensional Data, 25–71 (2006)

    Google Scholar 

  16. Jian, F.: Web text mining based on DBSCAN clustering algorithm. Science Information 1 (2007)

    Google Scholar 

  17. Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc. of 2nd International Conference on Knowledge Discovery, pp. 226–231 (1996)

    Google Scholar 

  18. Rauber, A., Merkl, D., Dittenbach, M.: The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data. IEEE Transactions on Neural Networks 13, 1331–1341 (2002)

    Article  MATH  Google Scholar 

  19. Koikkalainen, P., Oja, E.: Self-organizing hierarchical feature maps. In: 1990 IJCNN International Joint Conference on Neural Networks, pp. 279–284 (1990)

    Google Scholar 

  20. Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and its Applications, 3rd edn. Springer, Heidelberg (2008)

    Book  MATH  Google Scholar 

  21. Bennett, C., Li, M., Ma, B.: Chain letters and evolutionary histories. Scientific American 288, 76–81 (2003)

    Article  Google Scholar 

  22. Miller, G.A., Beckitch, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to WordNet: An On-line Lexical Database. Cognitive Science Laboratory. Princeton University Press, Princeton (1993)

    Google Scholar 

  23. Voorhees, E.: Using WordNet to disambiguate word senses for text retrieval. In: Proceedings of the 16th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 171–180. ACM, New York (1993)

    Google Scholar 

  24. Szymański, J., Mizgier, A., Szopiński, M., Lubomski, P.: Ujednoznacznianie słow przy użyciu słownika WordNet. In: Wydawnictwo Naukowe PG TI 2008, vol. 18, pp. 89–195 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Szymański, J. (2011). Self–Organizing Map Representation for Clustering Wikipedia Search Results. In: Nguyen, N.T., Kim, CG., Janiak, A. (eds) Intelligent Information and Database Systems. ACIIDS 2011. Lecture Notes in Computer Science(), vol 6592. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20042-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20042-7_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20041-0

  • Online ISBN: 978-3-642-20042-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics