Implications of Data Density and Length of Collection Period for Population Estimations Using Social Media Data

  • Samuel Lee ToepkeEmail author
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 936)


When programmatically utilizing public APIs provided by social media services, it is possible to attain a large volume of volunteered geographic information. Geospatially enabled data from Twitter, Instagram, Panaramio, etc. can be used to create high-resolution estimations of human movements over time, with volume of the data being of critical importance. This investigation extends previous work, showing the effects of artificial data removal, and generated error; though using over twice as much collected data, attained using an enterprise cloud solution, over a span of thirteen months instead of five.


Media Enterprise systems Cloud Volunteered geographic data 


  1. 1.
    Abdi, H., Williams, L.J.: Normalizing Data. Encyclopedia of Research Design, pp. 935–938. Sage, Thousand Oaks (2010)Google Scholar
  2. 2.
    Aubrecht, C., Ungar, J., Freire, S.: Exploring the potential of volunteered geographic information for modeling spatio-temporal characteristics of urban population: a case study for Lisbon Metro using foursquare check-in data. In: 7th International Conference Virtual City and Territory, Lisboa, pp. 57–60 (2011)Google Scholar
  3. 3.
    Aubrecht, C., Özceylan Aubrecht, D., Ungar, J., Freire, S., Steinnocher, K.: VGDI-advancing the concept: volunteered geo-dynamic information and its benefits for population dynamics modeling. Trans. GIS 21, 253–276 (2016)CrossRefGoogle Scholar
  4. 4.
    Boyd, D., Crawford, K.: Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Inf. Commun. Soc. 15(5), 662–679 (2012)CrossRefGoogle Scholar
  5. 5.
    Chai, T., Draxler, R.R.: Root mean square error (RMSE) or mean absolute error (MAE)?-arguments against avoiding RMSE in the literature. Geosci. Model. Dev. 7(3), 1247–1250 (2014)CrossRefGoogle Scholar
  6. 6.
    Coleman, D.J., Georgiadou, Y., Labonte, J., et al.: Volunteered geographic information: the nature and motivation of produsers. Int. J. Spat. Data Infrastruct. Res. 4(1), 332–358 (2009)Google Scholar
  7. 7.
    FEMA: Cascadia Rising 2016. Accessed 08 Dec 2016
  8. 8.
    Freire, S., Florczyk, A., Ferri, S.: Modeling day-and nighttime population exposure at high resolution: application to volcanic risk assessment in campi flegrei. In: 12th International Conference on Information Systems for Crisis Response and Management (2015)Google Scholar
  9. 9.
  10. 10.
    Goodchild, M.F.: Citizens as sensors: the world of volunteered geography. GeoJournal 69(4), 211–221 (2007)CrossRefGoogle Scholar
  11. 11.
    Goodchild, M.F., Aubrecht, C., Bhaduri, B.: New questions and a changing focus in advanced VGI research. Trans. GIS 21, 189–190 (2016)CrossRefGoogle Scholar
  12. 12.
    GNIP - The World’s Largest and Most Trusted Provider of Social Data. Accessed 29 July 2017
  13. 13.
    GNU Octave. Accessed 29 July 2017
  14. 14.
    Haines, E.: Point in polygon strategies. In: Graphics gems IV, vol. 994, pp. 24–26 (1994)CrossRefGoogle Scholar
  15. 15.
    Haklay, M., Weber, P.: Openstreetmap: user-generated street maps. IEEE Pervasive Comput. 7(4), 12–18 (2008)CrossRefGoogle Scholar
  16. 16.
    Haklay, M.: How good is volunteered geographical information? A comparative study of OpenStreetMap and ordnance survey datasets. Environ. Plan. B: Plan. Des. 37(4), 682–703 (2010)CrossRefGoogle Scholar
  17. 17.
    Heaton, T.H., Hartzell, S.H.: Earthquake hazards on the Cascadia subduction zone. Science 236(4798), 162–168 (1987)CrossRefGoogle Scholar
  18. 18.
    Hochman, H.M., Rodgers, J.D.: Pareto optimal redistribution. Am. Econ. Rev. 59(4), 542–557 (1969)Google Scholar
  19. 19.
    JTS Topology Suite. Accessed 29 July 2017
  20. 20.
    Leong, L., Toombs, D., Gill, B.: Magic quadrant for cloud infrastructure as a service, worldwide. Analyst(s) 501, G00265139 (2015)Google Scholar
  21. 21.
    Mennis, J., Hultgren, T.: Intelligent dasymetric mapping and its application to areal interpolation. Cartogr. Geogr. Inf. Sci. 33(3), 179–194 (2006)CrossRefGoogle Scholar
  22. 22.
    Miller, H.J.: The data avalanche is here. Shouldn’t we be digging? J. Reg. Sci. 50(1), 181–201 (2010)CrossRefGoogle Scholar
  23. 23.
    Morstatter, F., Pfeffer, J., Liu, H., Carley, K.M.: Is The Sample Good Enough? Comparing Data from Twitter’s Streaming API with Twitter’s Firehose. arXiv preprint arXiv:1306.5204 (2013)
  24. 24.
    Moussalli, R., Srivatsa, M., Asaad, S.: Fast and flexible conversion of geohash codes to and from latitude/longitude coordinates. In: 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). IEEE (2015)Google Scholar
  25. 25.
    Oracle Technology Network for Java Developers — Oracle Technology Network — Oracle. Accessed 29 July 2017
  26. 26.
    Overview of Amazon Web Services. Accessed 29 July 2017
  27. 27.
    PostGIS - Spatial and Geographic Objects for PostgreSQL. Accessed 29 July 2017
  28. 28.
    Sagl, G., Resch, B., Hawelka, B., Beinat, E.: From social sensor data to collective human behaviour patterns: analysing and visualising spatio-temporal dynamics in urban environments. In: Proceedings of the GI-Forum, pp. 54–63 (2012)Google Scholar
  29. 29.
    Stewart, R., et al.: Can social media play a role in developing building occupancy curves for small area estimation? In: Proceedings of 13th International Conference GeoComp (2015)Google Scholar
  30. 30.
    Toepke, S.L., Starsman, R.S.: Population distribution estimation of an urban area using crowd sourced data for disaster response. In: 12th International Conference on Information Systems for Crisis Response and Management (2015)Google Scholar
  31. 31.
    Toepke, S.L.: Investigation of geospatially enabled, social media generated structure occupancy curves in commercial structures. In: Grueau, C., Laurini, R., Rocha, J.G. (eds.) GISTAM 2016. CCIS, vol. 741, pp. 49–61. Springer, Cham (2017). Scholar
  32. 32.
    Toepke, S.L.: Data density considerations for crowd sourced population estimations from social media. In: Proceedings of the 3rd International Conference on Geographical Information Systems Theory, Applications and Management - GISTAM, vol. 1, pp. 35–42 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Private Engineering FirmWashington DCUSA

Personalised recommendations