Skip to main content

Business, Culture, Politics, and Sports How to Find Your Way through a Bulk of News? On Content-Based Hierarchical Structuring and Organization of Large Document Archives

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2113))

Abstract

With the increasing amount of information available in electronic document collections, methods for organizing these collections to allow topic-oriented browsing and orientation gain increasing importance. The SOMLib digital library system provides such an organization based on the Self-Organizing Map, a popular neural network model by producing a map of the document space. However, hierarchical relations between documents are hidden in the display. Moreover, with increasing size of document archives the required maps grow larger, thus leading to problems for the user in finding proper orientation within the map. In this case, a hierarchically structured representation of the document space would be highly preferable.

In this paper, we present the Growing Hierarchical Self-Organizing Map, a dynamically growing neural network model, providing a content-based hierarchical decomposition and organization of document spaces. This architecture evolves into a hierarchical structure according to the requisites of the input data during an unsupervised training process. A recent enhancement of the training process further ensures proper orientation of the various topical partitions. This facilitates intuitive navigation between neighboring topical branches. The benefits of this approach are shown by organizing a real-world document collection according to semantic similarities.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. J. Blackmore and R. Miikkulainen. Incremental grid growing: Encoding high-dimensional structure into a two-dimensional feature map. In Proceedings of the IEEE International Conference on Neural Networks (ICNN’93), volume 1, pages 450–455, San Francisco, CA, USA, 1993. http://ieeexplore.ieee.org/.

    Article  Google Scholar 

  2. H. Chen, C. Schuffels, and R. Orwig. Internet categorization and search: A self-organizing approach. Journal of Visual Communication and Image Representation, 7(1):88–102, 1996. http://ai.BPA.arizona.edu/papers/.

    Article  Google Scholar 

  3. M. Dittenbach, D. Merkl, and A. Rauber. The growing hierarchical self-organizing map. In Proceedings of the International Joint Conference on Neural Networks (IJCNN 2001), volume VI, pages 15–19, Como, Italy, 2000. IEEE Computer Society. http://www.ifs.tuwien.ac.at/ifs/research/publications.html.

    Google Scholar 

  4. B. Fritzke. Growing Grid-A self-organizing network with constant neighborhood range and adaption strength. Neural Processing Letters, 2(5):1–5, 1995. http://pikas.inf.tu-dresden.de/~fritzke.

    Article  Google Scholar 

  5. T. Kohonen. Self-organizing maps. Springer-Verlag, Berlin, 1995.

    Google Scholar 

  6. T. Kohonen, S. Kaski, K. Lagus, J. Salojärvi, J. Honkela, V. Paatero, and A. Saarela. Self-organization of a massive document collection. IEEE Transactions on Neural Networks, 11(3):574–585, May 2000. http://ieeexplore.ieee.org/.

    Article  Google Scholar 

  7. X. Lin. A self-organizing semantic map for information retrieval. In Proceedings of the 14. Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR91), pages 262–269, Chicago, IL, October 13-16 1991. ACM. http://www.acm.org/dl.

  8. R. Miikkulainen. Script recognition with hierarchical feature maps. Connection Science, 2:83–101, 1990.

    Article  Google Scholar 

  9. A. Rauber and D. Merkl. The SOMLib Digital Library System. In Proceedings of the 3. European Conference on Research and Advanced Technology for Digital Libraries (ECDL99), LNCS 1696, pages 323–342, Paris, France, 1999. Springer. http://www.ifs.tuwien.ac.at/ifs/research/publications.html.

    Chapter  Google Scholar 

  10. A. Rauber and D. Merkl. Using self-organizing maps to organize document collections and to characterize subject matters: How to make a map tell the news of the world. In Proceedings of the 10._International Conference on Database and Expert Systems Applications (DEXA99), LNCS 1677, pages 302–311, Florence, Italy, 1999. Springer. http://www.ifs.tuwien.ac.at/ifs/research/publications.html.

    Google Scholar 

  11. G. Salton. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley, Reading, MA, 1989.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dittenbach, M., Rauber, A., Merkl, D. (2001). Business, Culture, Politics, and Sports How to Find Your Way through a Bulk of News? On Content-Based Hierarchical Structuring and Organization of Large Document Archives. In: Mayr, H.C., Lazansky, J., Quirchmayr, G., Vogel, P. (eds) Database and Expert Systems Applications. DEXA 2001. Lecture Notes in Computer Science, vol 2113. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44759-8_21

Download citation

  • DOI: https://doi.org/10.1007/3-540-44759-8_21

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-42527-4

  • Online ISBN: 978-3-540-44759-7

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics