CRISOL: An Approach for Automatically Populating Semantic Web from Unstructured Text Collections

  • Roxana Danger
  • Rafael Berlanga
  • José Rui’z-Shulcloper
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3180)


Currently, the main drawback for the development of the Semantic Web stems from the manual tagging of web pages according to a given ontology that conceptualizes its domain. This tasks is usually hard, even for experts, and it is prone to errors due to the different interpretations users can have about the same documents. In this paper we address the problem of automatically gene rating ontology instances starting from a collection of unstructured documents (e.g. plain texts, HTML pages, etc.). These instances will populate the Semantic Web that is described by the ontology. The proposed approach combines Information Extraction tec hniques, mainly entity recognition, information merging and Text Mining techniques. This approach has been successfully applied in the development of a Semantic Web for the Archaeology Research.


Text Fragment Text Collection Partial Instance Text Mining Technique Complex Instance 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. Scientific American (2001)Google Scholar
  2. 2.
    Forno, F., Farinetti, L., Mehan, S.: Can Data Mining Techniques Ease The Semantic Tagging Burden? In: SWDB 2003, pp. 277–292 (2003)Google Scholar
  3. 3.
    Doan, A., et al.: Learning to match ontologies on the Semantic Web. VLDB Journal 12(4), 303–319 (2003)CrossRefGoogle Scholar
  4. 4.
    Appelt, D.: Introduction to Information Extraction. AI Communications 12 (1999)Google Scholar
  5. 5.
    Llidó, D.M., Berlanga, R., Aramburu, M.J.: Extracting Temporal References to Assign Document Event-Time Periods. In: Mayr, H.C., Lazanský, J., Quirchmayr, G., Vogel, P. (eds.) DEXA 2001. LNCS, vol. 2113, pp. 62–71. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  6. 6.
    Maedche, A., Neumann, G., Staab, S.: Bootstrapping an Ontology based Information Extraction System. In: Studies in Fuzziness and Soft Computing, Springer, Heidelberg (2001)Google Scholar
  7. 7.
    Danger, R.M., Berlanga, R., Ruiz-Shulcloper, J.: Text Mining using the Hierarchical Syntactical Structure of Documents. In: X Conferencia de la Asociación Española para la Inteligencia Artificial (CAEPIA 2003), pp. 139–144 (2003)Google Scholar
  8. 8.
    Dirección General del Patrimonio Artístico,

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Roxana Danger
    • 1
  • Rafael Berlanga
    • 2
  • José Rui’z-Shulcloper
    • 3
  1. 1.Universidad de OrienteSantiago de CubaCuba
  2. 2.Universitat Jaume ICastellónEspaña
  3. 3.Institute of Cybernetics, Mathematics and PhysicsLa HabanaCuba

Personalised recommendations