Skip to main content

Beyond Geotagged Tweets: Exploring the Geolocalisation of Tweets for Transportation Applications

  • Chapter
  • First Online:
Transportation Analytics in the Era of Big Data

Abstract

Researchers in multiple disciplines have used Twitter to study various mobility patterns and “live” aspects of cities. In the field of transportation planning, one major area of interest has been to use Twitter data to infer movement patterns and origins and destinations of trip-makers. In the area of transportation operations, researchers have been interested in automated incident detection or event detection. Because the number of geotagged tweets pinpointing the location of the user at the time of tweeting tends to be sparse for transportation applications, there is a need to consider expanding and geolocalising the sample of non-geotagged tweets that can be associated with locations. We call this process “geolocalisation”. While geolocalisation is an active area of research associated with the geospatial semantic Web and Geographic Information Retrieval, much of the work has focused on geolocalisation of users, or on geolocalisation of tweeting activity to fairly coarse geographical levels, whereas our work relates to street-level or even building-level geolocalisation. We will consider two different approaches to geolocalisation that make use of Points of Interest databases and a second information retrieval-based approach that trains on geotagged tweets. Our objective is to make a comprehensive assessment of the differences in spatial and content coverage between non-geotagged tweets geolocalised using different approaches compared to using geotagged tweets alone. We find that using geolocalised tweets allows discovery of a larger number of incidents and socioeconomic patterns that are not evident from using geotagged data alone, including activity throughout the metropolitan area, including deprived “Environmental Justice” (EJ) areas where the degree of social media activity detected is usually low. Conclusions are drawn on the relative usefulness of the alternative approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://dev.twitter.com/streaming/overview.

  2. 2.

    https://msdn.microsoft.com/en-us/library/hh441725.aspx.

  3. 3.

    https://lehd.ces.census.gov.

  4. 4.

    https://developer.mapquest.com/.

  5. 5.

    https://msdn.microsoft.com/en-us/library/hh478192.aspx.

  6. 6.

    https://developer.foursquare.com/.

  7. 7.

    3-grams is the best value to reduce matching ambiguity according to our experiments.

  8. 8.

    www.rise-group.org/risem/clusterpy.

References

  1. M.A. Abbasi, S.K. Chai, H. Liu, K. Sagoo, Real-world behavior analysis through a social media lens, in Proceedings of the 5th International Conference on Social Computing, Behavioral-Cultural Modeling and Prediction, SBP’12 (Springer, Berlin, 2012), pp. 18–26

    Book  Google Scholar 

  2. F. Alesiani, K. Gkiotsalitis, R. Baldessari, A probabilistic activity model for predicting the mobility patterns of homogeneous social groups based on social network data, in Transportation Research Board 93rd Annual Meeting, 14-1033 (2014)

    Google Scholar 

  3. H.w. Chang, D. Lee, M. Eltaher, J. Lee, @phillies tweeting from philly? Predicting twitter user locations with spatial word usage, in Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining, ASONAM 2012 (IEEE Computer Society, Washington, 2012), pp. 111–118. https://doi.org/10.1109/ASONAM.2012.29

  4. Z. Cheng, J. Caverlee, K. Lee, You are where you tweet: a content-based approach to geo-locating twitter users, in Proceedings of the 19th ACM International Conference on Information and Knowledge Management (ACM, New York, 2010), pp. 759–768

    Google Scholar 

  5. R. Compton, D. Jurgens, D. Allen, Geotagging one hundred million twitter accounts with total variation minimization, in 2014 IEEE International Conference on Big Data (Big Data) (IEEE, Piscataway, 2014), pp. 393–401

    Book  Google Scholar 

  6. C.D. Cottrill, P.V. Thakuriah, Evaluating pedestrian crashes in areas with high low-income or minority populations. Accid. Anal. Prev. 42(6), 1718–1728 (2010)

    Article  Google Scholar 

  7. J. Cui, R. Fu, C. Dong, Z. Zhang, Extraction of traffic information from social media interactions: methods and experiments, in 2014 IEEE 17th International Conference on Intelligent Transportation Systems (ITSC) (IEEE, Piscataway, 2014), pp. 1549–1554

    Google Scholar 

  8. A. Culotta, Reducing sampling bias in social media data for county health inference, in Joint Statistical Meetings Proceedings (2014), pp. 1–12

    Google Scholar 

  9. E. D’Andrea, P. Ducange, B. Lazzerini, F. Marcelloni, Real-time detection of traffic from twitter stream analysis. IEEE Trans. Intell. Transp. Syst. 16(4), 2269–2283 (2015)

    Article  Google Scholar 

  10. O. Dekel, O. Shamir, Vox populi: collecting high-quality labels from a crowd, in COLT (2009)

    Google Scholar 

  11. M. Dredze, M.J. Paul, S. Bergsma, H. Tran, Carmen: a twitter geolocation system with applications to public health, in Proceedings of the AAAI Workshop on Expanding the Boundaries of Health Informatics Using Artificial Intelligence, Palo Alto, California (2013)

    Google Scholar 

  12. J.C. Duque, J. Aldstadt, E. Velasquez, J.L. Franco, A. Betancourt, A computationally efficient method for delineating irregularly shaped spatial clusters. J. Geogr. Syst. 13(4), 355–372 (2011)

    Article  Google Scholar 

  13. J. Eisenstein, B. O’Connor, N.A. Smith, E.P. Xing, A latent variable model for geographic lexical variation, in Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, EMNLP ’10 (Association for Computational Linguistics, Stroudsburg, 2010), pp. 1277–1287. http://dl.acm.org/citation.cfm?id=1870658.1870782

    Google Scholar 

  14. D. Flatow, M. Naaman, K.E. Xie, Y. Volkovich, Y. Kanza, On the accuracy of hyper-local geotagging of social media content, in Proceedings of the Eighth ACM International Conference on Web Search and Data Mining (ACM, New York, 2015), pp. 127–136

    Google Scholar 

  15. S. Gao, J.A. Yang, B. Yan, Y. Hu, K. Janowicz, G. McKenzie, Detecting origin-destination mobility flows from geotagged tweets in greater Los Angeles area, in Eighth International Conference on Geographic Information Science, GIScience’14 (2014)

    Google Scholar 

  16. M. Gjoka, M. Kurant, C.T. Butts, A. Markopoulou, Walking in facebook: a case study of unbiased sampling of osns, in 2010 Proceedings IEEE Infocom (IEEE, New York, 2010), pp. 1–9

    Book  Google Scholar 

  17. M. Graham, S.A. Hale, D. Gaffney, Where in the world are you? Geolocation and language identification in twitter. Prof. Geogr. 66(4), 568–578 (2014)

    Google Scholar 

  18. Y. Gu, Z.S. Qian, F. Chen, From twitter to detector: real-time traffic incident detection using social media data. Transp. Res. Part C Emerg. Technol. 67, 321–342 (2016)

    Article  Google Scholar 

  19. B. Han, P. Cook, A stacking-based approach to twitter user geolocation prediction, in Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL 2013): System Demonstrations (2013), pp. 7–12

    Google Scholar 

  20. B. Han, P. Cook, T. Baldwin, Text-based twitter user geolocation prediction. J. Artif. Intell. Res. 49, 451–500 (2014)

    Article  Google Scholar 

  21. S. Hasan, S.V. Ukkusuri, Urban activity pattern classification using topic models from online geo-location data. Transp. Res. Part C Emerg. Technol. 44, 363–381 (2014)

    Article  Google Scholar 

  22. S. Hasan, S.V. Ukkusuri, Location contexts of user check-ins to model urban geo life-style patterns. PLoS One 10(5), e0124819 (2015)

    Google Scholar 

  23. B. Hecht, M. Stephens, A tale of cities: urban biases in volunteered geographic information, in International Conference on Weblogs and Social Media, vol. 14 (2014), pp. 197–205

    Google Scholar 

  24. Z. Ji, A. Sun, G. Cong, J. Han, Joint recognition and linking of fine-grained locations from tweets, in Proceedings of the 25th International Conference on World Wide Web (International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, 2016), pp. 1271–1281

    Google Scholar 

  25. P. Jin, M. Cebelak, F. Yang, J. Zhang, C. Walton, B. Ran, Location-based social networking data: exploration into use of doubly constrained gravity model for origin-destination estimation. Transp. Res. Rec. J. Transp. Res. Board 2430, 72–82 (2014)

    Article  Google Scholar 

  26. D. Jurgens, That’s what friends are for: inferring location in online social media platforms based on social relationships, in International Conference on Weblogs and Social Media, vol. 13 (2013), pp. 273–282

    Google Scholar 

  27. S. Kinsella, V. Murdock, N. O’Hare, I’m eating a sandwich in glasgow: modeling locations with tweets, in Proceedings of the 3rd International Workshop on Search and Mining User-Generated Contents (ACM, New York, 2011), pp. 61–68

    Google Scholar 

  28. R. Kosala, E. Adi, et al., Harvesting real time traffic information from twitter. Procedia Eng. 50, 1–11 (2012)

    Article  Google Scholar 

  29. A. Kurkcu, K. Ozbay, E.F. Morgul, Evaluating the usability of geo-located twitter as a tool for human activity and mobility patterns: a case study for New York city, in Transportation Research Board 95th Annual Meeting, 16-3901 (2016)

    Google Scholar 

  30. J.H. Lee, S. Gao, K. Janowicz, K.G. Goulias, Can twitter data be used to validate travel demand models? in IATBR 2015-WINDSOR (2015)

    Google Scholar 

  31. C. Li, A. Sun, Fine-grained location extraction from tweets with temporal awareness, in Proceedings of the 37th International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, New York, 2014), pp. 43–52

    Google Scholar 

  32. P.A. Longley, M. Adnan, G. Lansley, The geotemporal demographics of twitter usage. Environ. Plan. A 47(2), 465–484 (2015)

    Article  Google Scholar 

  33. E. Mai, R. Hranac, Twitter interactions as a data source for transportation incidents, in Proceedings of the Transportation Research Board 92nd Annual Meeting, 13-1636 (2013)

    Google Scholar 

  34. C.D. Manning, P. Raghavan, H. Schütze, et al., Introduction to Information Retrieval, vol. 1 (Cambridge University Press, Cambridge, 2008)

    Book  Google Scholar 

  35. J. McGee, J. Caverlee, Z. Cheng, Location prediction in social media based on tie strength, in Proceedings of the 22nd ACM International Conference on Information and Knowledge Management (ACM, New York, 2013), pp. 459–468

    Google Scholar 

  36. A. Mislove, S. Lehmann, Y.Y. Ahn, J.P. Onnela, J.N. Rosenquist, Understanding the demographics of twitter users, in 5th International Conference on Weblogs and Social Media, vol. 11 (2011)

    Google Scholar 

  37. P. Paraskevopoulos, T. Palpanas, Where has this tweet come from? Fast and fine-grained geolocalization of non-geotagged tweets. Soc. Netw. Anal. Min. 6(1), 89 (2016)

    Google Scholar 

  38. J.D.G. Paule, Y. Moshfeghi, J.M. Jose, P. Thakuriah, On fine-grained geo-localisation of tweets, in Proceedings of the 2017 ACM International Conference on the Theory of Information Retrieval, ICTIR ’17 (ACM, New York, 2017). https://doi.org/10.1145/3121050.3121104

    Google Scholar 

  39. R. Priedhorsky, A. Culotta, S.Y. Del Valle, Inferring the origin locations of tweets with quantitative confidence, in Proceedings of the 17th ACM Conference on Computer Supported Cooperative Work and Social Computing (ACM, New York, 2014), pp. 1523–1536

    Google Scholar 

  40. V.C. Raykar, S. Yu, L.H. Zhao, G.H. Valadez, C. Florin, L. Bogoni, L. Moy, Learning from crowds. J. Mach. Learn. Res. 11(4), 1297–1322 (2010)

    Google Scholar 

  41. C.C. Robusto, The cosine-haversine formula. Am. Math. Mon. 64(1), 38–40 (1957)

    Article  Google Scholar 

  42. J.A. Rodriguez Perez, J.M. Jose, On microblog dimensionality and informativeness: exploiting microblogs’ structure and dimensions for ad-hoc retrieval, in Proceedings of the 2015 International Conference on The Theory of Information Retrieval, ICTIR ’15 (ACM, New York, 2015), pp. 211–220. https://doi.org/10.1145/2808194.2809466

    Google Scholar 

  43. A. Schulz, A. Hadjakos, H. Paulheim, J. Nachtwey, M. Mühlhäuser, A multi-indicator approach for geolocalization of tweets, in International Conference on Weblogs and Social Media (2013)

    Google Scholar 

  44. A. Schulz, P. Ristoski, H. Paulheim, I see a car crash: real-time detection of small scale incidents in microblogs, in The Semantic Web: ESWC 2013 Satellite Events (Springer, Berlin, 2013), pp. 22–33

    Google Scholar 

  45. A. Schulz, B. Schmidt, T. Strufe, Small-scale incident detection based on microposts, in Proceedings of the 26th ACM Conference on Hypertext and Social Media (ACM, New York, 2015), pp. 3–12

    Google Scholar 

  46. L. Sloan, J. Morgan, Who tweets with their location? Understanding the relationship between demographic characteristics and the use of geoservices and geotagging on twitter. PLoS One 10(11), e0142209 (2015)

    Google Scholar 

  47. E. Steiger, T. Ellersiek, A. Zipf, Explorative public transport flow analysis from uncertain social media data, in Proceedings of the 3rd ACM SIGSPATIAL International Workshop on Crowdsourced and Volunteered Geographic Information, GeoCrowd ’14 (ACM, New York, 2014), pp. 1–7. https://doi.org/10.1145/2676440.2676444

    Google Scholar 

  48. P. Thakuriah, D.G. Geers, Transportation and Information: Trends in Technology and Policy (Springer, Berlin, 2013)

    Book  Google Scholar 

  49. P. Thakuriah, P. Metaxatos, J. Lin, E. Jensen, An examination of factors affecting propensities to use bicycle and pedestrian facilities in suburban locations. Transp. Res. Part D Transp. Environ 17(4), 341–348 (2012)

    Article  Google Scholar 

  50. P. Thakuriah, N. Tilahun, M. Zellner, Big data and urban informatics: innovations and challenges to urban planning and knowledge discovery, in Seeing Cities Through Big Data: Research Methods and Applications in Urban Informatics, chap. 10, ed. by P. Thakuriah, N. Tilahun, M. Zellner (Springer, New York, 2016), pp. 11–45

    Google Scholar 

  51. F.L. Wauthier, M.I. Jordan, Bayesian bias mitigation for crowdsourcing, in Advances in Neural Information Processing Systems (2011), pp. 1800–1808

    Google Scholar 

  52. F. Yang, P.J. Jin, X. Wan, R. Li, B. Ran, Dynamic origin-destination travel demand estimation using location based social networking data, in Transportation Research Board 93rd Annual Meeting, 14-5509 (2014)

    Google Scholar 

Download references

Acknowledgements

The research was supported by European Commission FP7 Grant No 632075 and the Research Council of UK’s Economic and Social Research Council Grant No ES/L011921/1. The authors would also like to express gratitude to Dr. Yashar Moshfeghi and Professor Joemon M. Jose for their assistance in providing the methods and data utilised in this work.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Piyushimita (Vonu) Thakuriah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Paule, J.D.G., Sun, Y., Thakuriah, P.(. (2019). Beyond Geotagged Tweets: Exploring the Geolocalisation of Tweets for Transportation Applications. In: Ukkusuri, S., Yang, C. (eds) Transportation Analytics in the Era of Big Data. Complex Networks and Dynamic Systems, vol 4. Springer, Cham. https://doi.org/10.1007/978-3-319-75862-6_1

Download citation

Publish with us

Policies and ethics