Twitter Geolocation Prediction Using Neural Networks

Thomas, Philippe; Hennig, Leonhard

doi:10.1007/978-3-319-73706-5_21

Twitter Geolocation Prediction Using Neural Networks

Philippe Thomas¹⁵ &
Leonhard Hennig¹⁵

Conference paper
Open Access
First Online: 06 January 2018

12k Accesses
9 Citations
1 Altmetric

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10713))

Abstract

Knowing the location of a user is important for several use cases, such as location specific recommendations, demographic analysis, or monitoring of disaster outbreaks. We present a bottom up study on the impact of text- and metadata-derived contextual features for Twitter geolocation prediction. The final model incorporates individual types of tweet information and achieves state-of-the-art performance on a publicly available test set. The source code of our implementation, together with pretrained models, is freely available at https://github.com/Erechtheus/geolocation.

You have full access to this open access chapter, Download conference paper PDF

1 Introduction

Data from social media platforms is an attractive real-time resource for data analysts. It can be used for a wide range of use cases, such as monitoring of fire- (Paul et al. 2014) and flue-outbreaks (Power et al. 2013), provide location-based recommendations (Ye et al. 2010), or is utilized in demographic analyses (Sloan et al. 2013). Although some platforms, such as Twitter, allow users to geolocate posts, Jurgens et al. (2015) reported that less than 3% of all Twitter posts are geotagged. This severely impacts the use of social media data for such location-specific applications.

The location prediction task can be either tackled as a classification problem, or alternatively as a multi-target regression problem. In the former case the goal is to predict city labels for a specific tweet, whereas the latter case predicts latitude and longitude coordinates for a given tweet. Previous studies showed that text in combination with metadata can be used to predict user locations (Han et al. 2014). Liu and Inkpen (2015) presented a system based on stacked denoising auto-encoders (Vincent et al. 2008) for location prediction. State-of-the-art approaches, however, often make use of very specific, non-generalizing features based on web site scraping, IP resolutions, or external resources such as GeoNames. In contrast, we present an approach for geographical location prediction that achieves state-of-the-art results using neural networks trained solely on Twitter text and metadata. It does not require external knowledge sources, and hence generalizes more easily to new domains and languages.

The remainder of this paper is organized as follows: First, we provide an overview of related work for Twitter location prediction. In Sect. 3 we describe the details of our neural network architecture. Results on the test set are shown in Sect. 4. Finally, we conclude the paper with some future directions in Sect. 5.

2 Related Work

For a better comparability of our approach, we focus on the shared task presented at the 2nd Workshop on Noisy User-generated Text (WNUT’16) (Han et al. 2016). The organizers introduced a dataset to evaluate individual approaches for tweet- and user-level location prediction. For tweet-level prediction the goal is to predict the location of one specific message, while for user-level prediction the goal is to predict the user location based on a variable number of user messages. The organizers evaluate team submissions based on accuracy and distance in kilometers. The latter metric allows to account for wrong, but geographically close predictions, for example, when the model predicts Vienna instead of Budapest.

We focus on the five teams who participated in the WNUT shared task. Official team results for tweet- and user-level predictions are shown in Table 1. Unfortunately, only three participants provided systems descriptions, which we will briefly summarize:

Table 1. Official WNUT’16 tweet- and user-level results ranked by tweet median error distance (in kilometers). Individual best results for all three criteria are highlighted in bold face.

Full size table

Team FujiXerox (Miura et al. 2016) built a neural network using text, user declared locations, timezone values, and user self-descriptions. For feature preprocessing the authors build several mapping services using external resources, such as GeoNames and time zone boundaries. Finally, they train a neural network using the fastText n-gram model (Joulin et al. 2016) on post text, user location, user description, and user timezone.

Team csiro (Jayasinghe et al. 2016) used an ensemble learning method built on several information resources. First, the authors use post texts, user location text, user time zone information, messenger source (e.g., Android or iPhone) and reverse country lookups for URL mentions to build a list of candidate cities contained in GeoNames. Furthermore, they scraped specific URL mentions and screened the website metadata for geographic coordinates. Second, a relationship network is built from tweets mentioning another user. Third, posts are used to find similar texts in the training data to calculate a class-label probability for the most similar tweets. Fourth, text is classified using the geotagging tool pigeo (Rahimi et al. 2016). The output of individual stages is then used in an ensemble learner.

Team cogeo (Chi et al. 2016) employ multinomial naïve Bayes and focus on the use of textual features (i.e., location indicative words, GeoNames gazetteers, user mentions, and hashtags).

3 Methods

We used the WNUT’16 shared task data consisting of 12,827,165 tweet IDs, which have been assigned to a metropolitan city center from the GeoNames database^{Footnote 1}, using the strategy described in Han et al. (2012). As Twitter does not allow to share individual tweets, posts need to be retrieved using the Twitter API, of which we were able to retrieve 9,127,900 (71.2%). The remaining tweets are no longer available, usually because users deleted these messages. In comparison, the winner of the WNUT’16 task (Miura et al. 2016) reported that they were able to successfully retrieve 9,472,450 (73.8%) tweets. The overall training data consists of 3,362 individual class labels (i.e., city names). In our dataset we only observed 3,315 different classes.

For text preprocessing, we use a simple whitespace tokenizer with lower casing, without any domain specific processing, such as unicode normalization (Davis et al. 2001) or any lexical text normalization (see for instance Han and Baldwin (2011)). The text of tweets, and metadata fields containing texts (user description, user location, user name, timezone) are converted to word embeddings (Mikolov et al. 2013), which are then forwarded to a Long Short-Term Memory (LSTM) unit (Hochreiter and Schmidhuber 1997). In our experiments we randomly initialized embedding vectors. We use batch normalization (Ioffe and Szegedy 2015) for normalizing inputs in order to reduce internal covariate shift. The risk of overfitting by co-adapting units is reduced by implementing dropout (Srivastava et al. 2014) between individual neural network layers. An example architecture for textual data is shown in Fig. 1a. Metadata fields with a finite set of elements (UTC offset, URL–domains, user language, tweet publication time, and application source) are converted to one-hot encodings, which are forwarded to an internal embedding layer, as proposed by Guo and Berkhahn (2016). Again batch normalization and dropout is applied to avoid overfitting. The architecture is shown in Fig. 1b.

Individual models are completed with a dense layer for classification, using a softmax activation function. We use stochastic gradient descent over shuffled mini-batches with Adam (Kingma and Ba 2014) and cross-entropy loss as objective function for classification. The parameters of our model are shown in Table 2.

Table 2. Selected parameter settings

Full size table

The WNUT’16 task requires the model to predict class labels and longitude/latitude pairs. To account for this, we predict the mean city longitude/latitude location given the class label. For user-level prediction, we classify all messages individually and predict the city label with the highest probability over all messages.

3.1 Model Combination

The internal representations for all different resources (i.e., text, user-description, user-location, user-name, user-timezone, links, UTC offset, user lang, tweet-time and source) are concatenated to build a final tweet representation. We then evaluate two training strategies: In the first training regime, we train the combined model from scratch. The parameters for all word embeddings, as well as all network layers, are initialized randomly. The parameters of the full model including the softmax layer combining the output of the individual LSTM– and metadata– models are learned jointly. For the second strategy, we first train each model separately, and then keep their parameters fixed while training only the final softmax layer.

4 Results

The individual performance of our different models is shown in Table 3. As simple baseline, we predict the city label most frequently observed in the training data (Jakarta in Indonesia). According to our bottom-up analysis, the user-location metadata is the most productive kind of information for tweet- and user-level location prediction. Using the text alone, we can correctly predict the location for 19.5% of all tweets with a median distance of 2,190 km to the correct location. Aggregation of pretrained models also increases performance for all three evaluation metrics in comparison to training a model from scratch.

Table 3. Tweet level results ranked by median error distance (in kilometers). Individual best results for all three criteria are highlighted in bold face. Full-scratch refers to a merged model trained from scratch, whereas the weights of the full-fixed model are only retrained where applicable. The baseline predicts the location most frequently observed in the training data (Jakarta).

Full size table

For tweet-level prediction, our best merged model outperforms the best submission (FujiXerox.2) in terms of accuracy, median and mean distance by 2.1% points, 21.9 km, and 613.1 km respectively. The ensemble learning method (csiro) outperforms our best models in terms of accuracy by 0.6% points, but our model performs considerably better on median and mean distance by 27.1 and 1358.8 km respectively. Additionally, the approach of csiro requires several dedicated services, such as GeoNames gazetteers, time zone to GeoName mappings, IP country resolver and customized scrapers for social media websites. The authors describe custom link handling for FourSquare, Swarm, Path, Facebook, and Instagram. On our training data we observed that these websites account for 1,941,079 (87.5%) of all 2,217,267 shared links. It is therefore tempting to speculate that a customized scraper for these websites could further boost our results for location prediction.

As team cogeo uses only the text of a tweet, the results of cogeo.1 are comparable with our text-model. The results show that our text-model outperforms this approach in terms of accuracy, median and mean distance to the gold standard by 4.9% points, 1,234 km, and 866 km respectively.

For user-level prediction, our method performs on a par with the individual best results collected from the three top team submissions (FujiXerox.2, csiro.1, and FujiXerox.1). A notable difference is the mean predicted error distance, where our model outperforms the best model by 125.3 km.

5 Conclusion

We presented our neural network architecture for the prediction of city labels and geo-coordinates for tweets. We focus on the classification task and derive longitude/latitude information from the city label. We evaluated models for individual Twitter (meta)-data in a bottom up fashion and identified highly location indicative fields. The proposed combination of individual models requires no customized text-preprocessing, specific website crawlers, database lookups or IP to country resolution while achieving state-of-the-art performance on a publicly available data set. For better comparability, source code and pretrained models are freely available to the community.

As future work, we plan to incorporate images as another type of metadata for location prediction using the approach presented by Simonyan and Zisserman (2014).

Notes

1.
http://www.geonames.org.

References

Chi, L., Lim, K.H., Alam, N., Butler, C.J.: Geolocation prediction in Twitter using location indicative words and textual features. In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), Osaka, Japan, pp. 227–234, December 2016. http://aclweb.org/anthology/W16-3930
Davis, M., Whistler, K., Dürst, M.: Unicode Normalization Forms. Technical report, Unicode Consortium (2001)
Google Scholar
Guo, C., Berkhahn, F.: Entity embeddings of categorical variables. CoRR, abs/1604.06737 (2016)
Google Scholar
Han, B., Baldwin, T.: Lexical normalisation of short text messages: makn sens a #Twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, Stroudsburg, PA, USA, vol. 1, pp. 368–378 (2011). http://dl.acm.org/citation.cfm?id=2002472.2002520. ISBN 978-1-932432-87-9
Han, B., Cook, P., Baldwin, T.: Geolocation prediction in social media data by finding location indicative words. In: COLING 2012, 24th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, Mumbai, India, pp. 1045–1062, 8–15 December 2012. http://aclweb.org/anthology/C/C12/C12-1064.pdf
Han, B., Cook, P., Baldwin, T.: Text-based Twitter user geolocation prediction. J. Artif. Int. Res. 49(1), 451–500 (2014). http://dl.acm.org/citation.cfm?id=2655713.2655726. ISSN 1076-9757
Google Scholar
Han, B., Rahimi, A., Derczynski, L., Baldwin, T.: Twitter geolocation prediction shared task of the 2016 workshop on noisy user-generated text. In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), Osaka, Japan, pp. 213–217, December 2016. http://aclweb.org/anthology/W16-3928
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735. ISSN 0899–7667
Article Google Scholar
Ioffe, S., Szegedy, C.: Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. CoRR, abs/1502.03167 (2015). http://arxiv.org/abs/1502.03167
Jayasinghe, G., Jin, B., Mchugh, J., Robinson, B., Wan, S.: CSIRO Data61 at the WNUT Geo Shared Task. In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), Osaka, Japan, pp. 218–226, December 2016. http://aclweb.org/anthology/W16-3929
Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of Tricks for Efficient Text Classification. CoRR, abs/1607.01759 (2016). http://arxiv.org/abs/1607.01759
Jurgens, D., Finethy, T., McCorriston, J., Xu, Y.T., Ruths, D.: Geolocation prediction in Twitter using social networks: a critical analysis and review of current practice. In: ICWSM, pp. 188–197 (2015)
Google Scholar
Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. CoRR, abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980
Liu, J., Inkpen, D.: Estimating user location in social media with stacked denoising auto-encoders. In: Proceedings of the 1st Workshop on Vector Space Modeling for Natural Language Processing, Denver, Colorado, pp. 201–210, June 2015. http://www.aclweb.org/anthology/W15-1527
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. CoRR, abs/1310.4546 (2013). http://arxiv.org/abs/1310.4546
Miura, Y., Taniguchi, M., Taniguchi, T., Ohkuma, T.: A simple scalable neural networks based model for geolocation prediction in Twitter. In: Proceedings of the 2nd Workshop on Noisy User-generated Text (WNUT), Osaka, Japan, pp. 235–239, December 2016. http://aclweb.org/anthology/W16-3931
Paul, M.J., Dredze, M., Broniatowski, D.: Twitter improves influenza forecasting. PLOS Currents Outbreaks 6 (2014)
Google Scholar
Power, R., Robinson, B., Ratcliffe, D.: Finding fires with Twitter. In: Australasian Language Technology Association Workshop, vol. 80 (2013)
Google Scholar
Rahimi, A., Cohn, T., Baldwin, T.: Pigeo: a python geotagging tool. In: Proceedings of ACL-2016 System Demonstrations, Berlin, Germany, pp. 127–132, August 2016. http://anthology.aclweb.org/P16-4022
Simonyan, K., Zisserman, A.: Very Deep Convolutional Networks for Large-Scale Image Recognition. CoRR, abs/1409.1556 (2014). http://arxiv.org/abs/1409.1556
Sloan, L., Morgan, J., Housley, W., Williams, M., Edwards, A., Burnap, P., Rana, O.: Knowing the tweeters: deriving sociologically relevant demographics from Twitter. Sociol. Res. Online, 18 (3) (2013). https://doi.org/10.5153/sro.3001. ISSN 1360–7804
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014). http://dl.acm.org/citation.cfm?id=2627435.2670313. ISSN 1532–4435
MathSciNet MATH Google Scholar
Vincent, P., Larochelle, H., Bengio, Y., Manzagol, P.-A.: Extracting and composing robust features with denoising autoencoders. In: Proceedings of the 25th International Conference on Machine Learning, ICML 2008, New York, NY, USA, pp. 1096–1103. ACM (2008). https://doi.org/10.1145/1390156.1390294. ISBN 978-1-60558-205-4
Ye, M., Yin, P., Lee, W.-C.: Location recommendation for location-based social networks. In: Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, GIS 2010, pp. 458–461, New York, NY, USA. ACM (2010). https://doi.org/10.1145/1869790.1869861. ISBN 978-1-4503-0428-3

Download references

Acknowledgments

This research was partially supported by the German Federal Ministry of Economics and Energy (BMWi) through the projects SD4M (01MD15007B) and SDW (01MD15010A) and by the German Federal Ministry of Education and Research (BMBF) through the project BBDC (01IS14013E).

Author information

Authors and Affiliations

Language Technology Lab, DFKI GmbH, Alt-Moabit 91c, 10559, Berlin, Germany
Philippe Thomas & Leonhard Hennig

Authors

Philippe Thomas
View author publications
You can also search for this author in PubMed Google Scholar
Leonhard Hennig
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Philippe Thomas .

Editor information

Editors and Affiliations

DFKI GmbH, Berlin, Germany
Georg Rehm
DFKI GmbH, Saarbrücken, Germany
Thierry Declerck

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Thomas, P., Hennig, L. (2018). Twitter Geolocation Prediction Using Neural Networks. In: Rehm, G., Declerck, T. (eds) Language Technologies for the Challenges of the Digital Age. GSCL 2017. Lecture Notes in Computer Science(), vol 10713. Springer, Cham. https://doi.org/10.1007/978-3-319-73706-5_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-73706-5_21
Published: 06 January 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73705-8
Online ISBN: 978-3-319-73706-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Abstract

1 Introduction

2 Related Work

3 Methods

3.1 Model Combination

4 Results

5 Conclusion

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation