Abstract
Geographical information extraction is a special case of information extraction. In this paper, we present a practical method of extracting both names of geographical entities and their relations from the Web. The method is composed of three major phases. First, we manually designed a list of 493 Chinese lexico-syntactical patterns for matching Web page excerpts which contain names of geographical entities and their relations; second, we developed a knowledge extractor for extracting those names and relations to generate a geographical graph whose nodes are entities, and edges represent relations of the entities; third, we developed several methods for handling problems or errors in the generated graph. Experimental results show that the OMKast-Googling system has a satisfactory performance both in the entity name extraction and relation extraction.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agichtein, E., Gravano, L.: Snowball: Extracting Relations from Large Plain-Text Collections. In: 5th ACM International Conference on Digital Libraries, pp. 85–94. ACM Press, New York (2000)
Blohm, S., Cimiano, P., Stemle, E.: Harvesting Relations from the Web–Quantifiying the Impact of Filtering Functions. In: 22rd Conference on Artificial Intelligence, pp. 1316–1321. AAAI Press, Menlo Park (2007)
Cao, C.G., Wang, H.T., Sui, Y.F.: Modeling and Acquisition of Traditional Chinese Drugs and Formulae. International Journal of Artificial Intelligence in Medicine 32, 3–13 (2004)
Dutta, K., Prakash, N., Kaushik, S.: Hybrid Framework for Information Extraction for Geographical Terms in Hindi Language Texts. In: 1st Interfnational Conference on Natural Language Processing and Knowledge Engineering, pp. 577–581. IEEE Press, New York (2005)
Gao, J.F., Li, M., Wu, A., Huang, C.N.: Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach. Journal of Computational Linguistics 31, 531–574 (2005)
Girju, R., Badulescu, A., Moldovan, D.: Learning Semantic Constraints for the Automatic Discovery of Part-Whole Relations. In: 2003 Human Language Technology Conference Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, pp. 1–8. Association for Computational Linguistics, Stroudsburg (2003)
Girju, R., Badulescu, A., Moldovan, D.: Automatic Discovery of Part-Whole Relations. Journal of Computational Linguistics 32, 83–135 (2006)
van Hage, W.R., Kolb, H., Schreiber, G.: A Method for Learning Part-Whole Relations. In: Cruz, I., Decker, S., Allemang, D., Preist, C., Schwabe, D., Mika, P., Uschold, M., Aroyo, L.M. (eds.) ISWC 2006. LNCS, vol. 4273, pp. 723–735. Springer, Heidelberg (2006)
Hearst, M.: Automatic Acquisition of Hyponyms from Large Text Corpora. In: 14th International Conference on Computational Linguistics, pp. 539–545 (1992)
Liu, L.: Theories and Methods of Extracting Concept and Hyponymy Relations. PhD Thesis, Institute of Computing Technology, Chinese Academy of Sciences (2007)
Mao, H.Y., Liu, K.: Handbook of World Geography. Knowledge Publisher (1984)
Nissim, N., Matheson, C., Reid, J.: Recognising Geographical Entities in Scottish Historical Documents. In: Workshop on Geographic Information Retrieval at SIGIR (2004)
Pantel, P., Pennacchiotti, M.: Espresso: Leveraging Generic Patterns for Automatically Harvesting Semantic Relations. In: 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, pp. 113–120. Association for Computational Linguistics, Stroudsburg (2006)
Tian, G.G.: Self-Supervised Knowledge Acquisition from Text from Constrained Corpora. PhD Thesis, Institute of Computing Technology, Chinese Academy of Sciences (2007)
Wang, C., Xie, X., Wang, L., Lu, Y.S., Ma, W.Y.: Detecting Geographic Locations from Web Resources. In: 3rd Workshop on Geographic Information Retrieval, pp. 17–24. ACM, New York (2005)
Wang, H.T., Cao, C.G., Gao, Y.: An Ontology-based System for Acquiring Knowledge from Semi-structured Text. Journal of Computers 28, 2010–2018 (2005)
Yu, L.: Acquisition and Verification of Terms from Large-Scale Chinese Corpora. MS Thesis, Institute of Computing Technology, Chinese Academy of Sciences (2006)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Cao, C., Wang, S., Jiang, L. (2014). A Practical Approach to Extracting Names of Geographical Entities and Their Relations from the Web. In: Buchmann, R., Kifor, C.V., Yu, J. (eds) Knowledge Science, Engineering and Management. KSEM 2014. Lecture Notes in Computer Science(), vol 8793. Springer, Cham. https://doi.org/10.1007/978-3-319-12096-6_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-12096-6_19
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12095-9
Online ISBN: 978-3-319-12096-6
eBook Packages: Computer ScienceComputer Science (R0)