An Experience Developing a Semantic Annotation System in a Media Group
Nowadays media companies have difficulties for managing large amounts of news from agencies and self-made articles. Journalists and documentalists must face categorization tasks every day. There is also an additional trouble due to the usual large size of the list of words in a thesaurus, the typical tool used to tag news in the media.
In this paper, we present a new method to tackle the problem of information extraction over a set of texts where the annotation must be composed by thesaurus elements. The method consists of applying lemmatization, obtaining keywords, and finally using a combination of Support Vector Machines (SVM), ontologies and heuristics to deduce appropriate tags for the annotation. We have evaluated it with a real set of changing news and we compared our tagging with the annotation performed by a real documentation department, obtaining very good results.
KeywordsSemantic tagging and classification Information Extraction NLP SVM Ontologies Text classification Media News
Unable to display preview. Download preview PDF.
- 2.Garrido, A.L., Gomez, O., Ilarri, S., Mena, E.: NASS: news annotation semantic system. In: Proceedings of ICTAI 2011, International Conference on Tools with Artificial Intelligence, pp. 904–905. IEEE (2011)Google Scholar
- 3.Smeaton, A.F.: Using NLP or NLP resources for information retrieval tasks. Natural Language Information Retrieval. Kluwer Academic Publishers (1997)Google Scholar
- 6.Carreras, X., Chao, I., Padró, L., Padró, M.: Freeling: an open-source suite of language analyzers. In: Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 239–242. European Language Resources Association (2004)Google Scholar
- 7.Sekine, S., Ranchhod, E.: Named entities: recognition, classification and use. John Benjamins (2009)Google Scholar
- 9.Fernandez-Lopez, M., Corcho, O.: Ontological engineering. Springer (2004)Google Scholar
- 11.Vogrincic, S., Bosnic, Z.: Ontology-based multi-label classification of economic articles. Computer Science and Information Systems 8(1), 101–119 (2011), ComSIS ConsortiumGoogle Scholar
- 12.Wu, X., Xie, F., Wu, G., Ding, W.: Personalized news filtering and summarization on the web. In: Proceedings of ICTAI 2011, International Conference on Tools with Artificial Intelligence, pp. 414–421. IEEE (2011)Google Scholar