Skip to main content

Automatic Extraction of Geographic Locations on Articles of Digital Newspapers

  • Conference paper
Trends in Practical Applications of Agents and Multiagent Systems

Abstract

On this article, we present a model to make easier the reading of digital newspapers extracting the location of the news from the articles and showing the places associated with the news on a map. A module of supervised keyword-based extraction recognizes and classifies the geographical locations like named entities. The extraction results are improved using dictionaries or gazetteers (a list of named entities of the geographic area where the news are located). Thesauri are also used to check and complete the results, and for the named entities disambiguation. Finally, the model has been applied to “El Norte de Castilla”, a digital publication of Vallladolid, to validate and identify the tools and techniques with the best results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baldwin, B., Carpenter, B.: LingPipe, http://www.alias-i.com/lingpipe/

  2. Blázquez, L.M.V., Pascual, A.F.R., Ángel, M., Poveda, B.: Ingeniería ontológica: El camino hacia la mejora del acceso a la información geográfica en el entorno web. In: Subdirección General de Aplicaciones Geográficas del Instituto Geográfico Nacional. Avances En Las Infraestructuras De Datos Espaciales, p. 95 (2006)

    Google Scholar 

  3. Brugmann, H., Malaisé, V., Gazendam, L.: Disambiguating automatic semantic annotation based on a thesaurus structure. In: Proc. 14e Conference Sur le Traitement Automatique des Langues Naturelles, TALN 2007 (2007)

    Google Scholar 

  4. CAGEclass, http://cageclass.sourceforge.net/ (last visit January 2011)

  5. Chinchor, N.: Overview of MUC-7/MET-2. In: Proc. Message Understanding Conference, MUC-7 (1999)

    Google Scholar 

  6. CoNLL-2011, http://www.clips.ua.ac.be/conll/ (last visit March 2011)

  7. Drools: The Business Object Integration Platform, http://www.jboss.org/drools

  8. Flores Cuadrado, A., Villoslada de la Torre, E., Peláez Gutiérrez, A.: Generación de Tesauros basado en Media Wiki. Actas de los Talleres de las Jornadas de Ingeniería del Software y Bases de Datos 3(6) (2009)

    Google Scholar 

  9. FreeLing Home Page, http://nlp.lsi.upc.edu/freeling/ (last visit April 2011)

  10. Grishman, R., Sundheim, B.: Message Understanding Conference-6: A Brief History. In: Proc. 16th Conference on Computational Linguistics, USA, vol. 1, pp. 466–471 (1996)

    Google Scholar 

  11. IREX: Information Retrieval and Extraction Exercise, http://nlp.cs.nyu.edu/irex/

  12. Isaac, A., Summers, E.: SKOS: Simple Knowledge Organization System primer (2008), http://www.w3.org/TR/skos-primer (last visit March 2011)

  13. Keyphrase Extraction Algorithm. Technical Report. Computer Science Department, University of Waikato. Hamilton, New Zealand, http://www.nzdl.org/Kea/index.html

  14. Learning Based Java. Cognitive Computation Group. Universidad de Illinois, EEUU, http://cogcomp.cs.illinois.edu/page/software_view/11 (last visit April 2011)

  15. LT-TTT2. Language Technology-Text Tokenisation Tool, http://www.ltg.ed.ac.uk/software/lt-ttt2 (last visit March 2011)

  16. Mansouri, A., Affendey, L.S., Mamat, A.: Named Entity Recognition Approaches. International Journal of Computer Science and Network Security 8, 339–344 (2008)

    Google Scholar 

  17. Marrero, M., Sánchez-Cuadrado, S., Lara, J.M., Andreadakis, G.: Evaluation of named entity extraction systems. In: Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2009), pp. 47–58 (2009)

    Google Scholar 

  18. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)

    Article  Google Scholar 

  19. Ortega, J.M.P., Cumbreras, M.A.G., Vega, M.G., López, L.A.U.: Sistemas de Recuperación de Información Geográfica multilinges en CLEF. Procesamiento Del Lenguaje Natural 40, 129–136 (2008)

    Google Scholar 

  20. Ortega, J.M.P., Ráez, A.M., Santiago, F.M., López, L.A.U.: Geo-NER: un reconocedor de entidades geográficas para inglés basado en GeoNames y Wikipedia. Procesamiento Del Lenguaje Natural 43, 33–40 (2009)

    Google Scholar 

  21. Ratinov, L.: Design Challenges and Misconceptions in Named Entity Recognition, http://cogcomp.cs.illinois.edu/page/publication_view/199 (last visit April 2011)

  22. Stanford Named Entity Recognizer. The Stanford Natural Language Processing Group, http://nlp.stanford.edu/software/CRF-NER.shtml (last visit April 2011)

  23. Toral, A.: DRAMNERI: a free knowledge based tool to named entity recognition. In: Proc. 1st Free Software Technologies Conference, La Coruña, España, pp. 27–31 (2005)

    Google Scholar 

  24. UpMyStreet, http://www.upmystreet.com/ (last visit April 2011)

  25. Vargas, J.D.: Reconocimiento de Entidades Nombradas en Textos no Estructurados. Technical Report. Universidad Nacional de Colombia (2008)

    Google Scholar 

  26. Zapater, S., Javier, J.: Ontologías para servicios web semánticos de información de tráfico. Revista digital Dialnet. Lectura en la Universitat de Valencia en 2006 (2006), http://dialnet.unirioja.es/servlet/tesis?codigo=7157 (last visit March 2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cesar García Gómez .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Gómez, C.G., Cuadrado, A.F., Mínguez, J.D., de la Torre, E.V. (2012). Automatic Extraction of Geographic Locations on Articles of Digital Newspapers. In: Rodríguez, J., Pérez, J., Golinska, P., Giroux, S., Corchuelo, R. (eds) Trends in Practical Applications of Agents and Multiagent Systems. Advances in Intelligent and Soft Computing, vol 157. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28795-4_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28795-4_12

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28794-7

  • Online ISBN: 978-3-642-28795-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics