Abstract
Currently, the main drawback for the development of the Semantic Web stems from the manual tagging of web pages according to a given ontology that conceptualizes its domain. This tasks is usually hard, even for experts, and it is prone to errors due to the different interpretations users can have about the same documents. In this paper we address the problem of automatically gene rating ontology instances starting from a collection of unstructured documents (e.g. plain texts, HTML pages, etc.). These instances will populate the Semantic Web that is described by the ontology. The proposed approach combines Information Extraction tec hniques, mainly entity recognition, information merging and Text Mining techniques. This approach has been successfully applied in the development of a Semantic Web for the Archaeology Research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001)
Forno, F., Farinetti, L., Mehan, S.: Can Data Mining Techniques Ease The Semantic Tagging Burden? In: SWDB 2003, pp. 277–292 (2003)
Doan, A., et al.: Learning to match ontologies on the Semantic Web. VLDB Journal 12(4), 303–319 (2003)
Appelt, D.: Introduction to Information Extraction. AI Communications 12 (1999)
Llidó, D.M., Berlanga, R., Aramburu, M.J.: Extracting Temporal References to Assign Document Event-Time Periods. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 62–71. Springer, Heidelberg (2001)
Maedche, A., Neumann, G., Staab, S.: Bootstrapping an Ontology based Information Extraction System. In: Studies in Fuzziness and Soft Computing, Springer, Heidelberg (2001)
Danger, R.M., Berlanga, R., Ruiz-Shulcloper, J.: Text Mining using the Hierarchical Syntactical Structure of Documents. In: X Conferencia de la Asociación Española para la Inteligencia Artificial (CAEPIA 2003), pp. 139–144 (2003)
Dirección General del Patrimonio Artístico, http://www.cult.gva.es/dgpa/
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Danger, R., Berlanga, R., Rui’z-Shulcloper, J. (2004). CRISOL: An Approach for Automatically Populating Semantic Web from Unstructured Text Collections. In: Galindo, F., Takizawa, M., Traunmüller, R. (eds) Database and Expert Systems Applications. DEXA 2004. Lecture Notes in Computer Science, vol 3180. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-30075-5_24
Download citation
DOI: https://doi.org/10.1007/978-3-540-30075-5_24
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-22936-0
Online ISBN: 978-3-540-30075-5
eBook Packages: Springer Book Archive