An Experience Developing a Semantic Annotation System in a Media Group

  • Angel L. Garrido
  • Oscar Gómez
  • Sergio Ilarri
  • Eduardo Mena
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7337)


Nowadays media companies have difficulties for managing large amounts of news from agencies and self-made articles. Journalists and documentalists must face categorization tasks every day. There is also an additional trouble due to the usual large size of the list of words in a thesaurus, the typical tool used to tag news in the media.

In this paper, we present a new method to tackle the problem of information extraction over a set of texts where the annotation must be composed by thesaurus elements. The method consists of applying lemmatization, obtaining keywords, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to deduce appropriate tags for the annotation. We have evaluated it with a real set of changing news and we compared our tagging with the annotation performed by a real documentation department, obtaining very good results.


Semantic tagging and classification Information Extraction NLP SVM Ontologies Text classification Media News 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Gilchrist, A.: Thesauri, taxonomies and ontologies: an etymological note. Journal of Documentation 59(1), 7–18 (2003)CrossRefGoogle Scholar
  2. 2.
    Garrido, A.L., Gomez, O., Ilarri, S., Mena, E.: NASS: news annotation semantic system. In: Proceedings of ICTAI 2011, International Conference on Tools with Artificial Intelligence, pp. 904–905. IEEE (2011)Google Scholar
  3. 3.
    Smeaton, A.F.: Using NLP or NLP resources for information retrieval tasks. Natural Language Information Retrieval. Kluwer Academic Publishers (1997)Google Scholar
  4. 4.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  5. 5.
    Wimalasuriya, D.C., Dou, D.: Ontology-Based Information Extraction: an introduction and a survey of current approaches. Journal of Information Science 36(3), 306–323 (2010)CrossRefGoogle Scholar
  6. 6.
    Carreras, X., Chao, I., Padró, L., Padró, M.: Freeling: an open-source suite of language analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 239–242. European Language Resources Association (2004)Google Scholar
  7. 7.
    Sekine, S., Ranchhod, E.: Named entities: recognition, classification and use. John Benjamins (2009)Google Scholar
  8. 8.
    Leopold, E., Kindermann, J.: Text categorization with support vector machines. How to Represent Texts in Input Space? Machine Learning 46, 423–444 (2002)zbMATHGoogle Scholar
  9. 9.
    Fernandez-Lopez, M., Corcho, O.: Ontological engineering. Springer (2004)Google Scholar
  10. 10.
    Cortes, C., Vapnik, V.N.: Support-vector networks. Machine Learning 20(3), 273–297 (1995)zbMATHGoogle Scholar
  11. 11.
    Vogrincic, S., Bosnic, Z.: Ontology-based multi-label classification of economic articles. Computer Science and Information Systems 8(1), 101–119 (2011), ComSIS ConsortiumGoogle Scholar
  12. 12.
    Wu, X., Xie, F., Wu, G., Ding, W.: Personalized news filtering and summarization on the web. In: Proceedings of ICTAI 2011, International Conference on Tools with Artificial Intelligence, pp. 414–421. IEEE (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Angel L. Garrido
    • 1
  • Oscar Gómez
    • 1
  • Sergio Ilarri
    • 2
  • Eduardo Mena
    • 2
  1. 1.Grupo Heraldo - Grupo La InformaciónPamplonaSpain
  2. 2.IIS DepartmentUniversity of ZaragozaSpain

Personalised recommendations