CRISOL: An Approach for Automatically Populating Semantic Web from Unstructured Text Collections

Danger, Roxana; Berlanga, Rafael; Rui’z-Shulcloper, José

doi:10.1007/978-3-540-30075-5_24

Roxana Danger¹⁹,
Rafael Berlanga²⁰ &
José Rui’z-Shulcloper²¹

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 3180))

Included in the following conference series:

International Conference on Database and Expert Systems Applications

670 Accesses
3 Citations

Abstract

Currently, the main drawback for the development of the Semantic Web stems from the manual tagging of web pages according to a given ontology that conceptualizes its domain. This tasks is usually hard, even for experts, and it is prone to errors due to the different interpretations users can have about the same documents. In this paper we address the problem of automatically gene rating ontology instances starting from a collection of unstructured documents (e.g. plain texts, HTML pages, etc.). These instances will populate the Semantic Web that is described by the ontology. The proposed approach combines Information Extraction tec hniques, mainly entity recognition, information merging and Text Mining techniques. This approach has been successfully applied in the development of a Semantic Web for the Archaeology Research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001)
Google Scholar
Forno, F., Farinetti, L., Mehan, S.: Can Data Mining Techniques Ease The Semantic Tagging Burden? In: SWDB 2003, pp. 277–292 (2003)
Google Scholar
Doan, A., et al.: Learning to match ontologies on the Semantic Web. VLDB Journal 12(4), 303–319 (2003)
Article Google Scholar
Appelt, D.: Introduction to Information Extraction. AI Communications 12 (1999)
Google Scholar
Llidó, D.M., Berlanga, R., Aramburu, M.J.: Extracting Temporal References to Assign Document Event-Time Periods. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 62–71. Springer, Heidelberg (2001)
Chapter Google Scholar
Maedche, A., Neumann, G., Staab, S.: Bootstrapping an Ontology based Information Extraction System. In: Studies in Fuzziness and Soft Computing, Springer, Heidelberg (2001)
Google Scholar
Danger, R.M., Berlanga, R., Ruiz-Shulcloper, J.: Text Mining using the Hierarchical Syntactical Structure of Documents. In: X Conferencia de la Asociación Española para la Inteligencia Artificial (CAEPIA 2003), pp. 139–144 (2003)
Google Scholar
Dirección General del Patrimonio Artístico, http://www.cult.gva.es/dgpa/

Download references

Author information

Authors and Affiliations

Universidad de Oriente, Santiago de Cuba, Cuba
Roxana Danger
Universitat Jaume I, Castellón, España
Rafael Berlanga
Institute of Cybernetics, Mathematics and Physics, La Habana, Cuba
José Rui’z-Shulcloper

Authors

Roxana Danger
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Berlanga
View author publications
You can also search for this author in PubMed Google Scholar
José Rui’z-Shulcloper
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Zaragoza, Ciudad Universitaria, Plaza San Francisco, 50009, Zaragoza
Fernando Galindo
Seikei University, Japan
Makoto Takizawa
Institute of Informatics in Business and Government, University of Linz, Altenbergerstr. 69, 4040, Linz, Austria
Roland Traunmüller

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Danger, R., Berlanga, R., Rui’z-Shulcloper, J. (2004). CRISOL: An Approach for Automatically Populating Semantic Web from Unstructured Text Collections. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2004. Lecture Notes in Computer Science, vol 3180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30075-5_24

Download citation

DOI: https://doi.org/10.1007/978-3-540-30075-5_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22936-0
Online ISBN: 978-3-540-30075-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics