Skip to main content

On Document Classification with Self-Organising Maps

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5495))

Abstract

This research deals with the use of self-organising maps for the classification of text documents. The aim was to classify documents to separate classes according to their topics. We therefore constructed self-organising maps that were effective for this task and tested them with German newspaper documents. We compared the results gained to those of k nearest neighbour searching and k-means clustering. For five and ten classes, the self-organising maps were better yielding as high average classification accuracies as 88-89%, whereas nearest neighbour searching gave 74-83% and k-means clustering 72-79% as their highest accuracies.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Airio, E.: Word normalization and decompounding in mono- and bilingual IR. Information Retrieval 9(3), 249–271 (2006)

    Article  Google Scholar 

  2. Chowdhury, N., Saha, D.: Unsupervised text classification using Kohonen’s self organizing network. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 715–718. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  3. Conover, W.J.: Practical Nonparametric Statistics. John Wiley & Sons, New York (1999)

    Google Scholar 

  4. Doan, A., Domingos, P., Halevy, A.: Learning to match the schemas of data sources: a multistrategy approach. Machine Learning 50, 279–301 (2003)

    Article  MATH  Google Scholar 

  5. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. John Wiley & Sons, New York (2001)

    MATH  Google Scholar 

  6. Guerro-Bote, V.P., Moya-Anegón, F., Herrero-Solana, V.: Document organization using Kohonen’s algorithm. Information Processing and Management 38, 79–89 (2002)

    Article  MATH  Google Scholar 

  7. Honkela, T.: Self-Organizing Maps in Natural Language Processing, Academic Dissertation. Helsinki University of Technology, Finland (1997)

    Google Scholar 

  8. Kohonen, T.: Self-Organizing Maps. Springer, Berlin (1995)

    Book  MATH  Google Scholar 

  9. Lagus, K., Kaski, S., Kohonen, T.: Mining massive document collections by the WEBSOM method. Information Sciences 163(1-3), 135–156 (2004)

    Article  Google Scholar 

  10. Moya-Anegón, F., Herrero-Solana, V., Jiménez-Contreras, E.: A connectionist and multivariate approach to science maps: the SOM, clustering and MDS applied to library and information science research. Journal of Information Science 32(1), 63–77 (2006)

    Article  Google Scholar 

  11. Saarikoski, J., Laurikkala, J., Järvelin, K., Juhola, M.: A study on the use of self-organising maps in information retrieval. To appear in Journal of Documentation (2008)

    Google Scholar 

  12. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  13. Serrano, J.I., del Castillo, M.D.: Evolutionary learning of document categories. Information Retrieval 10, 69–83 (2007)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Saarikoski, J., Järvelin, K., Laurikkala, J., Juhola, M. (2009). On Document Classification with Self-Organising Maps. In: Kolehmainen, M., Toivanen, P., Beliczynski, B. (eds) Adaptive and Natural Computing Algorithms. ICANNGA 2009. Lecture Notes in Computer Science, vol 5495. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04921-7_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-04921-7_15

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-04920-0

  • Online ISBN: 978-3-642-04921-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics