Advertisement

Clustering-based disambiguation of fine-grained place names from descriptions

  • Hao ChenEmail author
  • Maria Vasardani
  • Stephan Winter
Article
  • 39 Downloads

Abstract

Everyday place descriptions often contain place names of fine-grained features, such as buildings or businesses, that are more difficult to disambiguate than names referring to larger places, for example cities or natural geographic features. Fine-grained places are often significantly more frequent and more similar to each other, and disambiguation heuristics developed for larger places, such as those based on population or containment relationships, are often not applicable in these cases. In this research, we address the disambiguation of fine-grained place names from everyday place descriptions. For this purpose, we evaluate the performance of different existing clustering-based approaches, since clustering approaches require no more knowledge other than the locations of ambiguous place names. We consider not only approaches developed specifically for place name disambiguation, but also clustering algorithms developed for general data mining that could potentially be leveraged. We compare these methods with a novel algorithm, and show that the novel algorithm outperforms the other algorithms in terms of disambiguation precision and distance error over several tested datasets.

Keywords

Place description Place name disambiguation Fine-grained place Clustering 

Notes

References

  1. 1.
    Adelfio MD, Samet H (2013) Structured toponym resolution using combined hierarchical place categories. In: Proceedings of the 7th workshop on geographic information retrieval, pp 49–56Google Scholar
  2. 2.
    Amitay E, Har’El N, Sivan R, Soffer A (2004) Web-a-where: geotagging web content. In: Proceedings of SIGIR ’04 conference on research and development in information retrieval, pp 273–280Google Scholar
  3. 3.
    Angiulli F (2006) Clustering by exceptions. In: Proceedings of the national conference on artificial intelligence, pp 312–317Google Scholar
  4. 4.
    Ankerst M, Breunig MM, Kriegel HP, Sander J (1999) OPTICS: ordering points to identify the clustering structure. In: Proceedings of the ACM SIGMOD conference. Philadelphia, pp 49–60Google Scholar
  5. 5.
    Berkhin P (2006) A survey of clustering data mining techniques. In: Kogan J, Nicholas CTM (eds) Grouping multidimensional data. Springer, Berlin, pp 25–71Google Scholar
  6. 6.
    Buscaldi D (2011) Approaches to disambiguating toponyms. SIGSPATIAL Special 3(2):16–19CrossRefGoogle Scholar
  7. 7.
    Buscaldi D, Magnini B (2010) Grounding toponyms in an Italian local news corpus. In: Proceedings of the 6th workshop on geographic information retrieval, pp 70–75Google Scholar
  8. 8.
    Buscaldi D, Rosso P (2008) A conceptual density-based approach for the disambiguation of toponyms. Int J Geogr Inf Sci 22(3):301–313CrossRefGoogle Scholar
  9. 9.
    Buscaldi D, Rosso P (2008) Map-based vs. knowledge-based toponym disambiguation. In: Proceedings of the 2nd international workshop on geographic information retrieval, pp 19–22Google Scholar
  10. 10.
    Campello RJGB, Moulavi D, Sander J (2013) Density-based clustering based on hierarchical density estimates. In: Pei J, Tseng VS, Cao L, Motoda HXG (eds) Advances in knowledge discovery and data mining. Springer, Berlin, pp 160–172Google Scholar
  11. 11.
    Celeux G, Govaert G (1992) A classification em algorithm for clustering and two stochastic versions. Comput Stat Data Anal 14(3):315–332CrossRefGoogle Scholar
  12. 12.
    Cheng Z, Caverlee J, Lee K (2010) You are where you tweet: a content-based approach to geo-locating twitter users. In: Proceedings of the 19th ACM International conference on information and knowledge management, pp 759–768Google Scholar
  13. 13.
    Derungs C, Palacio D, Purves RS (2012) Resolving fine granularity toponyms: evaluation of a disambiguation approach. In: Proceedings of the 7th international conference on geographic information science, pp 1–5Google Scholar
  14. 14.
    Ertöz L, Steinbach M, Kumar V (2003) Finding clusters of different sizes, shapes, and densities in noisy, high dimensional data. In: Proceedings of the 2003 SIAM international conference on data mining, pp 47–58Google Scholar
  15. 15.
    Ester M, Kriegel HP, Sander J, Xu X, et al. (1996) A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of the second international conference on knowledge discovery and data mining (KDD-96). Portland, pp 226–231Google Scholar
  16. 16.
    Goodchild MF, Hill LL (2008) Introduction to digital gazetteer research. Int J Geograph Inf Sci 22(10):1039–1044.  https://doi.org/10.1080/13658810701850497. http://www.tandfonline.com/doi/abs/10.1080/13658810701850497, arXiv:1011.1669v3 CrossRefGoogle Scholar
  17. 17.
    Guha S, Rastogi R, Shim K (1998) Cure: an efficient clustering algorithm for large databases. In: ACM Sigmod record, vol 27. ACM, pp 73–84Google Scholar
  18. 18.
    Habib MB, Keulen MV, van Keulen M (2012) Improving toponym disambiguation by iteratively enhancing certainty of extraction. In: Proceedings of the international conference on knowledge discovery and information retrieval, KDIR 2012. Barcelona, pp 399–410Google Scholar
  19. 19.
    Hartigan JA, Wong MA (1979) Algorithm AS 136: a k-means clustering algorithm. J R Stat Soc 28(1):100–108Google Scholar
  20. 20.
    Hill LL (2000) Core elements of digital gazetteers: placenames, categories, and footprints. In: Research and advanced technology for digital libraries. Springer, pp 280–290  https://doi.org/10.1007/3-540-45268-0_26. http://link.springer.com/10.1007/3-540-45268-0_26
  21. 21.
    Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68–75CrossRefGoogle Scholar
  22. 22.
    Kim J, Vasardani M, Winter S (2015) Harvesting large corpora for generating place graphs. In: Workshop on cognitive engineering for spatial information processes, COSIT 2015, pp 20–26Google Scholar
  23. 23.
    Kohonen T (1998) The self-organizing map. Neurocomputing 21(1–3):1–6CrossRefGoogle Scholar
  24. 24.
    Leidner JL (2008) Toponym resolution in text: annotation, evaluation and applications of spatial grounding of place names. Universal-PublishersGoogle Scholar
  25. 25.
    Leidner JL, Sinclair G, Webber B (2003) Grounding spatial named entities for information extraction and question answering. In: Proceedings of the HLT-NAACL 2003 workshop on analysis of geographic references, pp 31–38Google Scholar
  26. 26.
    Lieberman MD, Samet H, Sankaranarayanan J, Sperling J (2007) STEWARD: architecture of a spatio-textual search engine. In: Samet H, Shahabi C, Schneider M (eds) Proceedings of the 15th annual ACM international symposium on advances in geographic information systems, Seattle, pp 186–193Google Scholar
  27. 27.
    Liu F, Vasardani M, Baldwin T (2014) Automatic identification of locative expressions from social media text: a comparative analysis. In: Proceedings of the 4th international workshop on location and the web. ACM, pp 9–16Google Scholar
  28. 28.
    Moncla L, Renteria-Agualimpia W, Nogueras-iso J, Gaio M (2014) Geocoding for texts with fine-grain toponyms : an experiment on a geoparsed hiking descriptions corpus. In: Proceedings of the 22nd ACM SIGSPATIAL international conference on advances in geographic information systems, pp 183–192Google Scholar
  29. 29.
    Palacio D, Derungs C, Purves R (2015) Development and evaluation of a geographic information retrieval system using fine grained toponyms. J Spat Inf Sci 2015(11):1–29Google Scholar
  30. 30.
    Ripley BD (1976) The second-order analysis of stationary point processes. J Appl Probab 13(2):255–266CrossRefGoogle Scholar
  31. 31.
    Roberts K, Bejan CA, Harabagiu SM (2010) Toponym disambiguation using events. In: FLAIRS conference, vol 10, p 1Google Scholar
  32. 32.
    Roller S, Speriosu M, Rallapalli S, Wing B, Baldridge J (2012) Supervised text-based geolocation using language models on an adaptive grid. In: Proceedings of the 2012 joint conference on empirical methods in natural language processing and computational natural language learning, pp 1500–1510Google Scholar
  33. 33.
    Smith DA, Crane G (2001) Disambiguating geographic names in a historical digital library. In: International conference on theory and practice of digital libraries, Springer, pp 127–136Google Scholar
  34. 34.
    Smith DA, Mann GS (2003) Bootstrapping toponym classifier. In: Proceedings of the HLT-NAACL 2003 workshop on analysis of geographic references. Association for Computational Linguistics, pp 45-49Google Scholar
  35. 35.
    Teitler BE, Lieberman MD, Panozzo D, Sankaranarayanan J, Samet H, Sperling J (2008) NewsStand: a new view on news. In: Aref W G, Mokbel M F, Schneider M (eds) Proceedings of the 16th ACM SIGSPATIAL international conference on advances in geographic information systems, pp 144–153Google Scholar
  36. 36.
    Vasardani M, Timpf S, Winter S, Tomko M (2013) From descriptions to depictions: a conceptual framework. In: Tenbrink T, Stell J, Galton A, Wood Z (eds) Spatial information theory: 11th international conference COSIT 2013. Springer, pp 299–319Google Scholar
  37. 37.
    Vasardani M, Winter S, Richter KF (2013b) Locating place names from place descriptions. Int J Geogr Inf Sci 27(12):2509–2532CrossRefGoogle Scholar
  38. 38.
    Wing B, Baldridge J (2014) Hierarchical discriminative classification for text-based geolocation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 336–348Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.University of MelbourneMelbourneAustralia

Personalised recommendations