Skip to main content

Natural Language Analysis for Semantic Document Modeling

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1959))

Abstract

To ease the retrieval of documents published on the Web, the documents should be classified in a way that users find helpful and meaningful. This paper presents an approach to semantic document classification and retrieval based on Natural Language Analysis and Conceptual Modeling. A conceptual domain model is used in combination with linguistic tools to define a controlled vocabulary for a document collection. Users may browse this domain model and interactively classify documents by selecting model fragments that describe the contents of the documents. Natural language tools are used to analyze the text of the documents and propose relevant domain model concepts and relations. The proposed fragments are refined by the users and stored as XML document descriptions. For document retrieval, lexical analysis is used to pre-process search expressions and map these to the domain model for manual query-refinement. A prototype of the system is described, and the approach is illustrated with examples from a document collection published by the Norwegian Center for Medical Informatics (KITH).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sølvberg, A. “Data and what they refer to”. in Conceptual modeling: Historical perspectives and future trends. 1998. In conjunction with 16th Int. Conf. on Conceptual modeling, Los Angeles, CA, USA.

    Google Scholar 

  2. Nordhuus, I., “Definisjonskatalog for Somatiske sykehus (In Norwegian)”, http://www.kith.no/kodeverk/definisjonskatalog/defkat_somatiske/default.htm, (Accessed: March 2000)

  3. Scott, M., “WordSmith Tools”, http://www.liv.ac.uk/~ms2928/wordsmit.htm, (Accessed: Jan 1998)

  4. Voutilainen, A., “A short introduction to the NP Tool”, http://www.lingsoft.fi/doc/nptool/intro, (Accessed: March 2000)

  5. SPRI, “Methods and Principles in terminological work (In Swedish)”,. 1991, Helso och sjukvårdens utvecklingsinstitutt.

    Google Scholar 

  6. ISO/DIS, “Terminology work-principles and methods”,. 1999.

    Google Scholar 

  7. Lingsoft, “NORTHES Norwegian Thesauri”, http://www.lingsoft.fi/cgi-pub/northes, (Accessed: March 2000)

  8. Lingsoft, “Lingsoft Indexing and Retreieval-Morphological Analysis”, http://www.lingsoft.fi/en/indexing/, (Accessed: March 2000)

  9. W3CRDF, “Resource Description Framework-Working Draft”, http://www.w3.org/Metadata/RDF/, (Accessed: March 2000)

  10. Weibel, S. and E. Millner, “The Dublin Core Metadata Element Set home page”, http://purl.oclc.org/dc/, (Accessed: May 199)

  11. Sparck-Jones, K., “What is The Role of NLP in Information Retrieval?”, in Natural Language Information Retrieval, T. Strzalkowski, Editor. 1999, Kluwer Academic Publisher.

    Google Scholar 

  12. BSCW, “Basic Support for Cooperative Work on the WWW”, http://bscw.gmd.de, (Accessed: May 1999)

  13. Farshchian, B.A. “ICE: An object-oriented toolkit for tailoring collaborative Web—applications”. in IFIP WG8.1 Conference on Information Systems in the WWW Environment. 1998. Beijing, China.

    Google Scholar 

  14. TeamWave, “TeamWave WorkPlace Overview”, http://www.teamwave.com, (Accessed: May, 1999)

  15. Voss, A., K. Nakata, M. Juhnke and T. Schardt. “Collaborative information management using concepts”. in 2nd International Workshop IIIS-99. 1999. Copenhague, DK: Postproceedings published by IGP.

    Google Scholar 

  16. Gruber, T., “Towards Priciples for the Design of Ontologies used for Knowledge Sharing”. Human and Computer Studies, 1995. Vol. 43 (No. 5/6): p. 907–928.

    Article  Google Scholar 

  17. Guarino, N., “Ontologies and Knowledge Bases”,. 1995, IOS Press, Amsterdam.

    Google Scholar 

  18. Uschold, M. “Building Ontologies: Towards a unified methodology”. in The 16th annual conference of the British Computer Society Specialist Group on Expert Systems. 1996. Cambridge (UK).

    Google Scholar 

  19. Gruber, T.R., “Ontolingua-A mechanism to support portable ontologies”,. 1992, Knowledge Systems Lab, Stanford University.

    Google Scholar 

  20. Domingue, J. “Tadzebao and WebOnto: Discussing, Browsing, and Editing Ontologies on the Web.”. in 11th Banff Knowledge Aquisition for Knowledge-based systems Workshop. 1998. Banff, Canada.

    Google Scholar 

  21. Fensel, D., S. Decker, M. Erdmann and R. Studer. “Ontobroker: How to make the web intelligent”. in 11th Banff Knowledge Aquisition for Knowledge-based systems Workshop. 1998. Banff, Canada.

    Google Scholar 

  22. Fensel, D., J. Angele, S. Decker, M. Erdmann and H.-P. Schnurr, “On2Broker: Improving access to information sources at the WWW”, http://www.aifb.uni-karlsruhe.de/WBS/www-broker/o2/o2.pdf, (Accessed: May, 1999)

  23. Swartout, B., R. Patil, K. Knight and T. Russ. “Ontosaurus: A tool for browsing and editing ontologies”. in 9th Banff Knowledge Aquisition for KNowledge-based systems Workshop. 1996. Banff, Canada.

    Google Scholar 

  24. Spriterm, “Spriterm-hälso och sjukvårdens gemensamma fakta och termdatabas”, http://www.spri.se/i/Spriterm/i-prg2.htm, (Accessed: March 2000)

  25. Soamares de Lima, L., A.H.F. Laender and B.A. Ribeiro-Neto. “A Hierarchical Approach to the Automatic Categorization of Medical Documents”. in CIKM*98. 1998. Bethesda, USA: ACM.

    Google Scholar 

  26. OMNI, “OMNI: Organisaing Medical Networked Information”, http://www.omni.ac.uk/, (Accessed: May, 1999)

  27. Galen, “Why Galen-The need for Integrated medical systems”, http://www.galen-organisation.com/approach.html, (Accessed: March 2000)

  28. ISO/IEC, “Information Technology-Document Description and Processing Languages”, http://www.ornl.gov/sgml/sc34/document/0058.htm, (Accessed: March 2000)

  29. Schneiderman, B., D. Byrd and W. Bruce Croft, “Clarifying Search: A User-Interface Framework for Text Searches”. D-Lib Magazine, 1997. Vol. (No. January)

    Google Scholar 

  30. Strzalkowski, T., F. Lin and J. Perez-Carballo. “Natural Language Information Retrieval TREC-6 Report”. in 6th Text Retrieval Conference, TREC-6. 1997. Gaithersburg, November, 1997.

    Google Scholar 

  31. Strzalkowski, T., G. Stein, G. Bowden-Wise, J. Perez-Caballo, P. Tapanainen, T. Jarvinen, A. Voutilainen and J. Karlgren. “Natural Language Information Retrieval-TREC-7 report”. in TREC-7. 1998.

    Google Scholar 

  32. Strzalkowski, T., “Natural Language Information Retrieval”. 1999: Kluwer Academic Publishers.

    Google Scholar 

  33. Arampatzis, A.T., T.P. van der Weide, P. van Bommel and C.H.A. Koster, “Linguistically Motivated Information Retrieval”,. 1999, University of Nijmegen.

    Google Scholar 

  34. Puder, A. “Service trading using conceptual structures”. in International Conference on Conceptual Structures (ICCS’95). 1995: Springer-Verlag.

    Google Scholar 

  35. Rau, L.F., “Knowledge organization and access in a conceptual information system”. Information Processing and Management, 1987. Vol. 21 (No. 4): p. 269–283.

    Article  Google Scholar 

  36. Katz, B. “From Sentence Processing to Information Access on the World Wide Web”. in AAAI Spring Symposium on Natural Language Processing for the World Wide Web. 1997. Stanford University, Stanford CA.

    Google Scholar 

  37. Métais, E., “The role of knowledge and reasoning i CASE Tools”,. 1999, University of Versailles.

    Google Scholar 

  38. Fliedl, G., C. Kop, W. Mayerthaler, H.C. May and C. Winkler. “NTS-based derivation of KCPM Perspective Determiners”. in 3rd Int. workshop on Applications of Natural Language to Information Systems (NLDB’97). 1997. Vancouver, Ca.

    Google Scholar 

  39. Tjoa, A.M. and L. Berger. “Transformation of Requirement Specifications Expressed in Natural Language into EER Model”. in 12th Int. conceference on Entity-Relation approach. 1993.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2001 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Brasethvik, T., Atle Gulla, J. (2001). Natural Language Analysis for Semantic Document Modeling. In: Bouzeghoub, M., Kedad, Z., Métais, E. (eds) Natural Language Processing and Information Systems. NLDB 2000. Lecture Notes in Computer Science, vol 1959. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45399-7_11

Download citation

  • DOI: https://doi.org/10.1007/3-540-45399-7_11

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-41943-3

  • Online ISBN: 978-3-540-45399-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics