, Volume 22, Issue 3, pp 589–613 | Cite as

A framework for annotating OpenStreetMap objects using geo-tagged tweets

  • Xin ChenEmail author
  • Hoang Vo
  • Yu Wang
  • Fusheng Wang


Recent years have witnessed an explosion of geospatial data, especially in the form of Volunteered Geographic Information (VGI). As a prominent example, OpenStreetMap (OSM) creates a free editable map of the world from a large number of contributors. On the other hand, social media platforms such as Twitter or Instagram supply dynamic social feeds at population level. As much of such data is geo-tagged, there is a high potential on integrating social media with OSM to enrich OSM with semantic annotations, which will complement existing objective description oriented annotations to provide a broader range of annotations. In this paper, we propose a comprehensive framework on integrating social media data and VGI data to derive knowledge about geographical objects, specifically, top relevant annotations from tweets for objects in OSM. We first integrate geo-tagged tweets with OSM data with scalable spatial queries running on MapReduce. We propose a frequency based method for annotating boundary based geographic objects (a polygon), and a probability based method for annotating point based geographic objects (Latitude and Longitude), with consideration of noise. We evaluate our methods using a large geo-tagged tweets corpus and representative geographic objects from OSM, which demonstrates promising results through ground-truth comparison and case studies. We are able to produce up to 80% correct names for geographical objects and discover implicitly relevant information, such as popular exhibitions of a museum, the nicknames or visitors’ impression to a tourism attraction.


Volunteered Geographic Information Social media OpenStreetMap Twitter Semantic annotation 


  1. 1.
    Aji A, Sun X, Vo H, Liu Q, Lee R, Zhang X, Saltz J, Wang F (2013) Demonstration of hadoop-gis: a spatial data warehousing system over mapreduce. In: SIGSPATIAL/GISGoogle Scholar
  2. 2.
    Aji A, Vo H, Wang F (2015) Effective spatial data partitioning for scalable query processing. coRRGoogle Scholar
  3. 3.
    Aji A, Wang F (2012) High performance spatial query processing for large scale scientific data. In: SIGMOD/PODS 2012 PhD symposiumGoogle Scholar
  4. 4.
    Aji A, Wang F, Vo H, Lee R, Liu Q, Zhang X, Saltz J (2013) Hadoop-GIS: a high performance spatial data warehousing system over mapreduce. In: Proc VLDB EndowGoogle Scholar
  5. 5.
    Bast H, Storandt S, Weidner S (2015) Fine-grained population estimation. In: SIGSPATIAL/GISGoogle Scholar
  6. 6.
    Breiman L, Meisel W, Purcell E (1977) Variable kernel estimates of multivariate densities. TechnometricsGoogle Scholar
  7. 7.
    Brinkhoff T, Kriegel H-P, Seeger B (1996) Parallel processing of spatial joins using r-trees. In: ICDEGoogle Scholar
  8. 8.
    Coffey C, Pozdnoukhov A (2013) Temporal decomposition and semantic enrichment of mobility flows. In: SIGSPATIAL/GIS Workshop LBSNGoogle Scholar
  9. 9.
    Georgiev P, Noulas A, Mascolo C (2014) The call of the crowd: event participation in location-based social services. In: AAAI conferenceGoogle Scholar
  10. 10.
    Georgiev P, Noulas A, thrive C. Mascolo. (2014) Where businesses predicting the impact of the olympic games on local retailers through location-based services data. In: AAAI conferenceGoogle Scholar
  11. 11.
    Goodchild MF (2007) Citizens as sensors: the world of volunteered geography. GeoJournalGoogle Scholar
  12. 12.
    Jurgens D, McCorriston J, Xu YT, Ruths D (2015) Geolocation prediction in twitter using social networks: a critical analysis and review of current practiceGoogle Scholar
  13. 13.
    Karamshuk D, Noulas A, Scellato S, Nicosia V, Mascolo C (2013) Geo-spotting: mining online location-based services for optimal retail store placement. In: ACM SIGKDD, ACMGoogle Scholar
  14. 14.
    Krumm J, Horvitz E (2015) Eyewitness: Identifying local events via space-time signals in twitter feeds. In: SIGSPATIAL/GISGoogle Scholar
  15. 15.
    Lee R, Wakamiya S, Sumiya K (2013) Urban area characterization based on crowd behavioral lifelogs over twitter. Personal and ubiquitous computingGoogle Scholar
  16. 16.
    Li Y, Steiner M, Wang L, Zhang Z-L, Bao J (2013) Exploring venue popularity in foursquare. In: INFOCOM, 2013 Proceedings IEEEGoogle Scholar
  17. 17.
    Lichman M, Smyth P (2014) Modeling human location data with mixtures of kernel densities. In: SIGKDDGoogle Scholar
  18. 18.
    Quattrone G, Capra L, De Meo P (2015) There’s no such thing as the perfect map: Quantifying bias in spatial crowd-sourcing datasets. In: CSCWGoogle Scholar
  19. 19.
    Quercia D, Aiello LM, Schifanella R, Davies A (2015) The digital life of walkable streets. In: WWWGoogle Scholar
  20. 20.
    Quercia D, Schifanella R, Aiello LM, McLean K (2015) Smelly maps: The digital life of urban smellscapes. ICWSMGoogle Scholar
  21. 21.
    Sengstock C, Gertz M (2012) Latent geographic feature extraction from social media. In: SIGSPATIAL/GISGoogle Scholar
  22. 22.
    Silverman BW (1986) Density estimation for statistics and data analysis. Chapman & Hall, LondonCrossRefGoogle Scholar
  23. 23.
    Thomee B, Rae A (2013) Uncovering locally characterizing regions within geotagged data. In: WWWGoogle Scholar
  24. 24.
    Vo H, Aji A, Wang F (2014) Sato: a spatial data partitioning framework for scalable query processing. In: SIGSPATIAL/GISGoogle Scholar
  25. 25.
    Wu F, Li Z, Lee W-C, Wang H, Huang Z (2015) Semantic annotaion of mobility data using social media. In: WWWGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceStony Brook UniversityStony BrookUSA

Personalised recommendations