Skip to main content

Self-Organizing Maps for Interactive Search in Document Databases

  • Chapter
Intelligent Exploration of the Web

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 111))

Abstract

In this contribution we discuss the application of self-organizing maps to arrange documents based on a similarity measure. For this, the concepts of self-organizing systems will be briefly reviewed, an overview of methods for the required document pre-processing and encoding will be given and applications of self-organizing maps in document retrieval will be discussed. Furthermore, a prototypical implementation of a software tool for interactive search in document databases will be presented, which combines conventional keyword search methods with the possibility to interactively explore a document collection. The usability of the presented approach is shown by sample searches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agosti, M., Crestani, F., and Pasi, G. (2001). Lectures on Information Retrieval, Lecture Notes in Computer Science. Vol. 1980, Springer Verlag.

    Google Scholar 

  2. Alahakoon, D., Halgamuge, S. K., and Srinivasan, B. (1998). A structure adapting feature map for optimal cluster representations, In: Proc. Int. Conf. On Neural Information Processing, Kitakyushu, Japan, pp. 809–812.

    Google Scholar 

  3. Digital Equipment Corporation (1995). AltaVista, http://www.altavista.com.

  4. Baeza-Yates, R., and Ribeiro-Neto, B. (1999). Modern Information Retrieval,Addison Wesley Longman.

    Google Scholar 

  5. Brin, S., and Page, L. (1998). The Anatomy of a Large-Scale Hypertextual Web Search Engine, In: Proc. of the 7th International World Wide Web Conference, pp. 107–117, Brisbane, Australia.

    Google Scholar 

  6. Deerwester, S., Dumais, S. T., Furnas, G. W., and Landauer, T. K. (1990). Indexing by latent semantic analysis, Journal of the American Society for Information Sciences, 41, pp. 391–407.

    Article  Google Scholar 

  7. Frakes, W. B., and Baeza-Yates, R. (1992). Information Retrieval: Data Structures and Algorithms, Prentice Hall, New Jersey.

    Google Scholar 

  8. Fritzke, B. (1994). Growing cell structures–a self-organizing network for unsupervised and supervised learning, Neural Networks, 7 (9), pp. 1441–1460.

    Article  Google Scholar 

  9. Greiff, W. R. (1998). A Theory of Term Weighting Based on Exploratory Data Analysis, In: 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, New York, NY.

    Google Scholar 

  10. Honkela, T. (1997). Self-Organizing Maps in Natural Language Processing, Helsinki University of Technology, Neural Networks Research Center, Espoo, Finland.

    Google Scholar 

  11. Honkela, T., Kaski, S., Lagus, K., and Kohonen, T. (1996). Newsgroup Exploration with the WEBSOM Method and Browsing Interface, Technical Report, Helsinki University of Technology, Neural Networks Research Center, Espoo, Finland.

    Google Scholar 

  12. Isbell, C. L., and Viola, P. (1998). Restructuring sparse high dimensional data for effective retrieval, In: Proc. of the Conference on Neural Information Processing (NIPS’98), pp. 480–486.

    Google Scholar 

  13. Kaski, S. (1998). Dimensionality reduction by random mapping: Fast similarity computation for clustering, In: Proc. Of the International Joint Conference on Artificial Neural Networks (IJCNN’98), pp. 413–418, IEEE.

    Google Scholar 

  14. Klose, A., Nürnberger, A., Kruse, R., Hartmann, G. K., and Richards, M. (2000). Interactive Text Retrieval Based on Document Similarities, Physics and Chemistry of the Earth, Part A: Solid Earth and Geodesy, 25(8), pp. 649654, Elsevier Science, Amsterdam.

    Google Scholar 

  15. Kohonen, T. (1982). Self-Organized Formation of Topologically Correct Feature Maps, Biological Cybernetics, 43, pp. 59–69.

    Article  MathSciNet  MATH  Google Scholar 

  16. Kohonen, T. (1984). Self-Organization and Associative Memory, Springer-Verlag, Berlin.

    MATH  Google Scholar 

  17. Kohonen, T., Kaski, S., Lagus, K., Salojärvi, J., Honkela, J., Paattero, V., and Saarela, A. (2000). Self organization of a massive document collection, IEEE Transactions on Neural Networks, 11 (3), pp. 574–585.

    Article  Google Scholar 

  18. Lagus, K., and Kaski, S. (1999). Keyword selection method for characterizing text document maps, In: Proceedings of ICANN99, Ninth International Conference on Artificial Neural Networks, pp. 371–376, IEEE.

    Chapter  Google Scholar 

  19. Lin, X., Marchionini, G., and Soergel, D. (1991). A selforganizing semantic map for information retrieval, In: Proceedings of the 14th International ACM/SIGIR Conference on Research and Development in Information Retrieval, pp. 262–269, ACM Press, New York.

    Google Scholar 

  20. Lochbaum, K. E., and Streeter, L. A. (1989). Combining and comparing the effectiveness of latent semantic indexing and the ordinary vector space model for information retrieval, Information Processing and Management, 25 (6), pp. 665–676.

    Article  Google Scholar 

  21. Merkl, D. (1998). Text classification with self-organizing maps: Some lessons learned, Neurocomputing, 21, pp. 61–77.

    Article  Google Scholar 

  22. Nürnberger, A. (2001). Interactive Text Retrieval Supported by Growing Self-Organizing Maps, In: Proc. of the International Workshop on Information Retrieval (JR2001), Infotech, Oulu, Finland.

    Google Scholar 

  23. Nürnberger, A., Klose, A., Kruse, R., Hartmann, G., and Richards, M. (2000). Interactive Text Retrieval Based on Document Similarities, In: Hartmann, G., Nölle, A., Richards, M., and Leitinger, R. (eds.), Data Utilization Software Tools 2 (DUST-2 CD-ROM), Max-Planck-Institut fir Aeronomie, Katlenburg-Lindau, Germany.

    Google Scholar 

  24. Porter, M. (1980). An algorithm for suffix stripping, Program, pp. 130–137.

    Google Scholar 

  25. Rauber, A. (1999). Label SOM: On the Labeling of Self-Organizing Maps, In: In Proc. of the International Joint Conference on Neural Networks (IJCNN’99), pp. 3527–3532, IEEE, Piscataway, NJ.

    Google Scholar 

  26. van Rijsbergen, C. J. (1986). A non-classical logic for Information Retrieval, The Computer Journal, 29 (6), pp. 481–485.

    Article  MATH  Google Scholar 

  27. Ritter, H., and Kohonen, T. (1989). Self-organizing semantic maps, Biological Cybernetics, 61 (4).

    Google Scholar 

  28. Robertson, S. E. (1977). The probability ranking principle, Journal of Documentation, 33, pp. 294–304.

    Article  Google Scholar 

  29. Roussinov, D. G., and Chen, H. (2001). Information navigation on the web by clustering and summarizing query results, Information Processing and Management, 37 (6), pp. 789–816.

    Article  MATH  Google Scholar 

  30. Salton, G., Allan, J., and Buckley, C. (1994). Automatic structuring and retrieval of large text files, Communications of the ACM, 37 (2), pp. 97–108.

    Article  Google Scholar 

  31. Salton, G., and Buckley, C. (1988). Term Weighting Approaches in Automatic Text Retrieval, Information Processing and Management, 24 (5), pp. 513–523.

    Article  Google Scholar 

  32. Salton, G., Wong, A., and Yang, C. S. (1975). A vector space model for automatic indexing, Communications of the ACM,18(11), pp. 613–620, (see also TR74–218, Cornell University, NY, USA).

    Google Scholar 

  33. Scholtes, J. (1993). Neural Networks in Natural Language Processing and Information Retrieval, PhD Thesis, University of Amsterdam, Amsterdam, Netherlands.

    Google Scholar 

  34. Steinbach, M., Karypis, G., and Kumara, V. (2000). A Comparison of Document Clustering Techniques, In: KDD Workshop on Text Mining, (see also TR #00–034, University of Minnesota, MN).

    Google Scholar 

  35. Turtle, H., and Croft, W. (1990). Inference Networks for Document Retrieval, In: Proc. of the 13th Int. Conf. on Research and Development in Information Retrieval, pp. 1–24, ACM, New York.

    Google Scholar 

  36. Witten, I. H., Moffat, A., and Bell, T. C. (1999). Managing Gigabytes: Compressing and Indexing Documents and Images, Morgan Kaufmann, San Francisco, CA.

    Google Scholar 

  37. Yang, J., and Filo, D. (1994). Yahoo Home Page, URL: http://www.yahoo.com.

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Nürnberger, A., Klose, A., Kruse, R. (2003). Self-Organizing Maps for Interactive Search in Document Databases. In: Szczepaniak, P.S., Segovia, J., Kacprzyk, J., Zadeh, L.A. (eds) Intelligent Exploration of the Web. Studies in Fuzziness and Soft Computing, vol 111. Physica, Heidelberg. https://doi.org/10.1007/978-3-7908-1772-0_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-7908-1772-0_8

  • Publisher Name: Physica, Heidelberg

  • Print ISBN: 978-3-7908-2519-0

  • Online ISBN: 978-3-7908-1772-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics