Skip to main content

A Practical Approach to Extracting Names of Geographical Entities and Their Relations from the Web

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8793))

Abstract

Geographical information extraction is a special case of information extraction. In this paper, we present a practical method of extracting both names of geographical entities and their relations from the Web. The method is composed of three major phases. First, we manually designed a list of 493 Chinese lexico-syntactical patterns for matching Web page excerpts which contain names of geographical entities and their relations; second, we developed a knowledge extractor for extracting those names and relations to generate a geographical graph whose nodes are entities, and edges represent relations of the entities; third, we developed several methods for handling problems or errors in the generated graph. Experimental results show that the OMKast-Googling system has a satisfactory performance both in the entity name extraction and relation extraction.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agichtein, E., Gravano, L.: Snowball: Extracting Relations from Large Plain-Text Collections. In: 5th ACM International Conference on Digital Libraries, pp. 85–94. ACM Press, New York (2000)

    Google Scholar 

  2. Blohm, S., Cimiano, P., Stemle, E.: Harvesting Relations from the Web–Quantifiying the Impact of Filtering Functions. In: 22rd Conference on Artificial Intelligence, pp. 1316–1321. AAAI Press, Menlo Park (2007)

    Google Scholar 

  3. Cao, C.G., Wang, H.T., Sui, Y.F.: Modeling and Acquisition of Traditional Chinese Drugs and Formulae. International Journal of Artificial Intelligence in Medicine 32, 3–13 (2004)

    Article  Google Scholar 

  4. Dutta, K., Prakash, N., Kaushik, S.: Hybrid Framework for Information Extraction for Geographical Terms in Hindi Language Texts. In: 1st Interfnational Conference on Natural Language Processing and Knowledge Engineering, pp. 577–581. IEEE Press, New York (2005)

    Google Scholar 

  5. Gao, J.F., Li, M., Wu, A., Huang, C.N.: Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach. Journal of Computational Linguistics 31, 531–574 (2005)

    Article  MATH  Google Scholar 

  6. Girju, R., Badulescu, A., Moldovan, D.: Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations. In: 2003 Human Language Technology Conference Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 1–8. Association for Computational Linguistics, Stroudsburg (2003)

    Google Scholar 

  7. Girju, R., Badulescu, A., Moldovan, D.: Automatic Discovery of Part-Whole Relations. Journal of Computational Linguistics 32, 83–135 (2006)

    Google Scholar 

  8. van Hage, W.R., Kolb, H., Schreiber, G.: A Method for Learning Part-Whole Relations. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 723–735. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  9. Hearst, M.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: 14th International Conference on Computational Linguistics, pp. 539–545 (1992)

    Google Scholar 

  10. Liu, L.: Theories and Methods of Extracting Concept and Hyponymy Relations. PhD Thesis, Institute of Computing Technology, Chinese Academy of Sciences (2007)

    Google Scholar 

  11. Mao, H.Y., Liu, K.: Handbook of World Geography. Knowledge Publisher (1984)

    Google Scholar 

  12. Nissim, N., Matheson, C., Reid, J.: Recognising Geographical Entities in Scottish Historical Documents. In: Workshop on Geographic Information Retrieval at SIGIR (2004)

    Google Scholar 

  13. Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, pp. 113–120. Association for Computational Linguistics, Stroudsburg (2006)

    Google Scholar 

  14. Tian, G.G.: Self-Supervised Knowledge Acquisition from Text from Constrained Corpora. PhD Thesis, Institute of Computing Technology, Chinese Academy of Sciences (2007)

    Google Scholar 

  15. Wang, C., Xie, X., Wang, L., Lu, Y.S., Ma, W.Y.: Detecting Geographic Locations from Web Resources. In: 3rd Workshop on Geographic Information Retrieval, pp. 17–24. ACM, New York (2005)

    Google Scholar 

  16. Wang, H.T., Cao, C.G., Gao, Y.: An Ontology-based System for Acquiring Knowledge from Semi-structured Text. Journal of Computers 28, 2010–2018 (2005)

    Google Scholar 

  17. Yu, L.: Acquisition and Verification of Terms from Large-Scale Chinese Corpora. MS Thesis, Institute of Computing Technology, Chinese Academy of Sciences (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Cao, C., Wang, S., Jiang, L. (2014). A Practical Approach to Extracting Names of Geographical Entities and Their Relations from the Web. In: Buchmann, R., Kifor, C.V., Yu, J. (eds) Knowledge Science, Engineering and Management. KSEM 2014. Lecture Notes in Computer Science(), vol 8793. Springer, Cham. https://doi.org/10.1007/978-3-319-12096-6_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-12096-6_19

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-12095-9

  • Online ISBN: 978-3-319-12096-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics