Abstract
The Semantic Web realization depends on the availability of critical mass of metadata for the web content, linked to formal knowledge about the world. This paper presents our vision about a holistic system allowing annotation, indexing, and retrieval of documents with respect to real-world entities. A system (called KIM), partially implementing this concept is shortly presented and used for evaluation and demonstration.
Our understanding is that a system for semantic annotation should be based upon specific knowledge about the world, rather than indifferent to any ontological commitments and general knowledge. To assure efficiency and reusability of the metadata we introduce a simplistic upper-level ontology which starts with some basic philosophic distinctions and goes down to the most popular entity types (people, companies, cities, etc.), thus providing many of the inter-domain common sense concepts and allowing easy domain-specific extensions. Based on the ontology, an extensive knowledge base of entities descriptions is maintained.
Semantically enhanced information extraction system providing automatic annotation with references to classes in the ontology and instances in the knowledge base is presented. Based on these annotations, we perform IR-like indexing and retrieval, further extended using the ontology and knowledge about the specific entities.
Chapter PDF
Similar content being viewed by others
Keywords
- Natural Language Processing
- Resource Description Framework
- Entity Type
- Semantic Annotation
- Formal Knowledge
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Bontcheva, K., Kiryakov, A., Cunningham, H., Popov, B., Dimitrov, M.: Semantic Web Enabled, Open Source Language Technology. In: proc. of EACL Workshop Language Technology and the Semantic Web NLPXML 2003, April 13 (2003)
Brickley, D., Guha, R.V. (eds.): Resource Description Framework (RDF) Schemas, W3C, http://www.w3.org/TR/2000/CR-rdf-schema-20000327/
Carr, L., Bechhofer, S., Goble, C., Hall, W.: Conceptual Linking: Ontology-based Open Hypermedia. In: The WWW10 Conference, Hong Kong, pp. 334–342 (May)
Cunningham, H.: Information Extraction: a User Guide (revised version). In: Department of Computer Science, University of Sheffield (May 1999)
Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications. In: Proc. of the 40th Anniversary Meeting of the Association for Computational Linguistics (2002)
Collier, N., Takeuchi, K., Kawazoe, A.: Open Ontology Forge: An Environment for Text Mining in a Semantic Web World. In: Proc. of the International Workshop on Semantic Web Foundations and Application Technologies, Nara, Japan (March 11)
Dean, M., Connolly, D., van Harmelen, F., Hendler, J., Horrocks, I., McGuinness, D., Patel-Schneider, P., Stein, L.A.: Web Ontology Language (OWL) Reference Version 1.0. In: W3C Working Draft, November 12 (2002), http://www.w3.org/TR/2002/WD-owl-ref-20021112/
Dumais, S., Cutrell, E., Cadiz, J., Jancke, G., Sarin, R., Robbins, D.: Stuff I’ve Seen: A system for personal information retrieval and re-use. In: proc. of SIGIR 2003, Toronto, Canada, July 28 – August 1, pp. 72–79. ACM Press, New York (2003)
Fensel, D.: Ontology Language, v.2 (Welcome to OIL). Deliverable 2, On-To-Knowledge project (December 2001), http://www.ontoknowledge.org/downl/del2.pdf
Handschuh, S., Staab, S., Ciravegna, F.: S-CREAM – Semi-automatic CREAtion of Metadata. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, p. 358. Springer, Heidelberg (2002)
Kahan, J., Koivunen, M., Prud’Hommeaux, E., Swick, R.: Annotea: An Open RDF Infrastructure for Shared Web Annotations. In: The WWW10 Conference, Hong Kong, pp. 623–632 (May)
Kampman, A., Harmelen, F., Broekstra, J.: Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema. In: Horrocks, I., Hendler, J. (eds.) ISWC 2002. LNCS, vol. 2342, p. 54. Springer, Heidelberg (2002)
Kiryakov, A., Simov, K.Iv., Ognyanov, D.: Ontology Middleware: Analysis and Design Del. 38. On-To-Knowledge (March 2002), http://www.ontoknowledge.org/downl/del38.pdf
Kiryakov, A., Simov, K.Iv.: Ontologically Supported Semantic Matching. In: Proc. of NODALIDA 1999: Nordic Conference on Comp. Linguistics, Trondheim, December 9–10 (1999)
Landauer, T., Dumais, S.: A solution to Plato’s problem: the Latent Semantic Analysis theory of acquisition, induction and representation of knowledge. Psychological Review 104(2), 211–240 (1997)
Maedche, A., Motik, B., Stojanovic, L., Studer, R., Volz, R.: Ontologies for Enterprise Knowledge Management. IEEE Intelligent Systems 18(2), 26–33 (2003), http://kaon.semanticweb.org/docus/ieee-is-maedcheetal.pdf
Mahesh, K., Kud, J., Dixon, P.: Oracle at TREC8: A Lexical Approach. In: proc. of the Eighth Text Retrieval Conference (TREC-8) (1999)
Manov, D., Kiryakov, A., Popov, B., Bontcheva, K., Maynard, D., Cunningham, H.: Experiments with geographic knowledge for information extraction. In: NAACL-HLT 2003, Workshop on the Analysis of Geographic References, Canada, Edmonton, Alberta, May 31 (2003)
Maynard, D., Tablan, V., Bontcheva, K., Cunningham, H., Wilks, Y.: MUlti-Source Entity recognition – an Information Extraction System for Diverse Text Types. Technical report CS–02–03, Univ. of Sheffield, Dep. of CS (2003), http://gate.ac.uk/gate/doc/papers.html
Moldovan, D., Mihalcea, R.: Document Indexing Using Named Entities. Studies in Informatics and Control 10(1) (March 2001)
Noy, N., Musen, M.: Ontology Versioning as an Element of an Ontology-Management Framework. IEEE Intelligent Systems (2003) (to appear)
Popov, B., Kiryakov, A., Kirilov, A., Manov, D., Ognyanoff, D., Goranov, M.: KIM – Semantic Annotation Platform. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds.) ISWC 2003. LNCS, vol. 2870, pp. 834–849. Springer, Heidelberg (2003) (to appear)
Pustejovsky, J., Boguraev, B., Verhagen, M., Buitelaar, P., Johnston, M.: Semantic Indexing and Typed Hyperlinking. In: Proc. of the AAAI Conference, Spring Symposium, NLP for WWW, Stanford University, CA, pp. 120–128 (1997)
van Ossenbruggen, J., Hardman, L., Rutledge, L.: Hypermedia and the Semantic Web: A Research Agenda. Journal of Digital information 3(1) (May 2002)
Vargas-Vera, M., Motta, E., Domingue, J., Lanzoni, M., Stutt, A., Ciravegna, F.: MnM: Ontology Driven Semi-Automatic and Automatic Support for Semantic Markup. In: Gómez-Pérez, A., Benjamins, V.R. (eds.) EKAW 2002. LNCS (LNAI), vol. 2473, p. 379. Springer, Heidelberg (2002)
Voorhees, E.: Using WordNet for Text Retrieval. In: Fellbaum, C. (ed.) WordNet: an electronic lexical database, MIT Press, Cambridge (1998)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kiryakov, A., Popov, B., Ognyanoff, D., Manov, D., Kirilov, A., Goranov, M. (2003). Semantic Annotation, Indexing, and Retrieval. In: Fensel, D., Sycara, K., Mylopoulos, J. (eds) The Semantic Web - ISWC 2003. ISWC 2003. Lecture Notes in Computer Science, vol 2870. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39718-2_31
Download citation
DOI: https://doi.org/10.1007/978-3-540-39718-2_31
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-20362-9
Online ISBN: 978-3-540-39718-2
eBook Packages: Springer Book Archive