World Wide Web

, Volume 21, Issue 2, pp 311–343 | Cite as

SNAF: Observation filtering and location inference for event monitoring on twitter

  • Yihong Zhang
  • Claudia Szabo
  • Quan Z. Sheng
  • Xiu Susie Fang


Twitter has recently emerged as a popular microblogging service that has 284 million monthly active users around the world. A part of the 500 million tweets posted on Twitter everyday are personal observations of immediate environment. If provided with time and location information, these observations can be seen as sensory readings for monitoring and localizing objects and events of interests. Location information on Twitter, however, is scarce, with less than 1% of tweets have associated GPS coordinates. Current researches on Twitter location inference mostly focus on city-level or coarser inference, and cannot provide accurate results for fine-grained locations. We propose an event monitoring system for Twitter that emphasizes local events, called SNAF (Sense and Focus). The system filters personal observations posted on Twitter and infers location of each report. Our extensive experiments with real Twitter data show that, the proposed observation filtering approach can have about 22% improvement over existing filtering techniques, and our location inference approach can increase the location accuracy by up to 36% within the 3km error range. By aggregating the observation reports with location information, our prototype event monitoring system can detect real world events, in many case earlier than news reports.


Twitter Social sensor Message filtering Location inference Event detection 


  1. 1.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: DBpedia: A nucleus for a web of open data. Springer (2007)Google Scholar
  2. 2.
    Branch, J., Szymanski, B., Giannella, C., Wolff, R., Kargupta, H.: In-network outlier detection in wireless sensor networks Proceedings of the 26th IEEE International Conference on Distributed Computing Systems, p 51 (2006)Google Scholar
  3. 3.
    Carroll, T.Z.J.: Unsupervised classification of sentiment and objectivity in chinese text Third International Joint Conference on Natural Language Processing, p 304 (2008)Google Scholar
  4. 4.
    Castillo, C., Mendoza, M., Poblete, B.: Information credibility on Twitter Proceedings of the 20th Internation World Wide Web Conference, pp 675–684 (2011)Google Scholar
  5. 5.
    Daiber, J., Jakob, M., Hokamp, C., Mendes, P.N.: Improving efficiency and accuracy in multilingual entity extraction Proceedings of the 9th International Conference on Semantic Systems, pp 121–124 (2013)Google Scholar
  6. 6.
    Graham, M., Hale, S.A., Gaffney, D.: Where in the world are you? geolocation and language identification in Twitter. Prof. Geogr. 66(4), 568–578 (2014)CrossRefGoogle Scholar
  7. 7.
    Hong, L., Ahmed, A., Gurumurthy, S., Smola, A.J., Tsioutsiouliklis, K.: Discovering geographical topics in the Twitter stream Proceedings of the 21st International World Wide Web Conference, pp 769–778 (2012)CrossRefGoogle Scholar
  8. 8.
    Ikawa, Y., Enoki, M., Tatsubori, M.: Location inference using microblog messages Proceedings of the 21st International World Wide Web Conference Companion, pp 687–690 (2012)Google Scholar
  9. 9.
    Jeffery, S. R., Alonso, G., Franklin, M. J., Hong, W., Widom, J.: Declarative support for sensor data cleaning. Pervasive Computing. Springer (2006)Google Scholar
  10. 10.
    Ji, Z., Sun, A., Cong, G., Han, J.: Joint recognition and linking of fine-grained locations from tweets Proceedings of the 25th International Conference on World Wide Web, pp 1271–1281 (2016)CrossRefGoogle Scholar
  11. 11.
    Kennedy, J.: Particle swarm optimization. Encyclopedia of Machine Learning, pages 760–766. Springer (2010)Google Scholar
  12. 12.
    Kinsella, S., Murdock, V., O’Hare, N.: I’m eating a sandwich in glasgow: modeling locations with tweets Proceedings of the 3rd International Workshop on Search and Mining User-generated Contents, pp 61–68 (2011)CrossRefGoogle Scholar
  13. 13.
    Knox, E.M., Ng, R.T.: Algorithms for mining distance based outliers in large datasets Proceedings of 24th International Conference on Very Large Data Bases, pp 392–403 (1998)Google Scholar
  14. 14.
    Kwon, S., Cha, M., Jung, K., Chen, W., Wang, Y.: Prominent features of rumor propagation in online social media Proceedings of 13th International Conference on Data Mining, pp 1103–1108 (2013)Google Scholar
  15. 15.
    Li, C., Sun, A.: Fine-grained location extraction from tweets with temporal awareness Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, pp 43–52 (2014)Google Scholar
  16. 16.
    Li, R., Lei, K.H., Khadiwala, R., Chang, K.-C.: TEDAS: A Twitter-based event detection and analysis system Proceedings of 28th International Conference on Data Engineering, pp 1273–1276 (2012)Google Scholar
  17. 17.
    Lingad, J., Karimi, S., Yin, J.: Location extraction from disaster-related microblogs Proceedings of the 22nd International World Wide Web Conference Companion, pp 1017–1020 (2013)CrossRefGoogle Scholar
  18. 18.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford coreNLP natural language processing toolkit Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp 55–60 (2014)CrossRefGoogle Scholar
  19. 19.
    McMinn, A.J., Moshfeghi, Y., Jose, J. M.: Building a large-scale corpus for evaluating event detection on Twitter. In: Proceedings of the 22nd ACM International Conference on Information & Knowledge Management, pp. 409–418 ACM (2013)Google Scholar
  20. 20.
    Moon, T.K.: The expectation-maximization algorithm. IEEE Signal Process. Mag. 13(6), 47–60 (1996)CrossRefGoogle Scholar
  21. 21.
    Mukherjee, S., Weikum, G., Danescu-Niculescu-Mizil, C.: People on drugs Credibility of user statements in health communities Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp 65–74 (2014)Google Scholar
  22. 22.
    Olteanu, A., Castillo, C., Diaz, F., Vieweg, S.: CrisisLex: A lexicon for collecting and filtering microblogged communications in crises Proceedings of the 8th International AAAI Conference on Weblogs and Social Media, pp 376–385 (2014)Google Scholar
  23. 23.
    Sakaki, T., Okazaki, M., shakes, Y. M.: Earthquake Twitter users: Real-time event detection by social sensors Proceedings of the 19th International World Wide Web Conference, pp 851–860 (2010)Google Scholar
  24. 24.
    Sakaki, T., Okazaki, M., Matsuo, Y.: Tweet analysis for real-time event detection and earthquake reporting system development. IEEE Trans. Knowl. Data Eng. 25(4), 919–931 (2013)CrossRefGoogle Scholar
  25. 25.
    Santorini, B.: Part-of-speech tagging guidelines for the penn treebank project (3rd revision). Technical Report MS-CIS-90-47 University of Pennsylvania Department of Computer and Information Science Technical (1990)Google Scholar
  26. 26.
    Schulz, A., Hadjakos, A., Paulheim, H., Nachtwey, J., Mühlhäuser, M.: A multi-indicator approach for geolocalization of tweets Proceedings of the Seventh International Conference on Weblogs and Social Media, pp 573–582 (2013)Google Scholar
  27. 27.
    Sheng, B., Li, Q., Mao, W., Jin, W.: Outlier detection in sensor networks Proceedings of the 8th ACM International Symposium on Mobile Ad Hoc Networking and Computing, pp 219–228 (2007)Google Scholar
  28. 28.
    Sriram, B., Fuhry, D., Demir, E., Ferhatosmanoglu, H., Demirbas, M.: Short text classification in Twitter to improve information filtering Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp 841–842 (2010)Google Scholar
  29. 29.
    Starbird, K., Maddock, J., Orand, M., Achterman, P., Mason, R. M.: Rumors, false flags, and digital vigilantes: Misinformation on Twitter after the 2013 boston marathon bombing. Proceedings of the iConference 2014, pp 654–662. iSchools (2014)Google Scholar
  30. 30.
    Subramaniam, S., Palpanas, T., Papadopoulos, D., Kalogeraki, V., Gunopulos, D.: Online outlier detection in sensor data using non-parametric models Proceedings of the 32nd International Conference on Very Large Data Bases, pp 187–198 (2006)Google Scholar
  31. 31.
    Tausczik, Y.R., Pennebaker, J.W.: The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29(1), 24–54 (2010)CrossRefGoogle Scholar
  32. 32.
    Unankard, S., Li, X., Sharaf, M., Zhong, J., Li, X.: Predicting elections from social networks based on sub-event detection and sentiment analysis Proceedings of 15th International Conference on Web Information Systems Engineering, Part II, pp 1–16. Springer (2014)Google Scholar
  33. 33.
    Unankard, S., Li, X., Sharaf, M.A.: Emerging event detection in social networks with location sensitivity. World Wide Web 18(5), 1393–1417 (2015)Google Scholar
  34. 34.
    Wen, Y. -J., Agogino, A.M., Goebel, K.: Fuzzy validation and fusion for wireless sensor networks Proceedings of the ASME International Mechanical Engineering Congress, pp 727–732 (2004)Google Scholar
  35. 35.
    Zhang, Y., Meratnia, N., Havinga, P.: Outlier detection techniques for wireless sensor networks A survey. Communications Surveys Tutorials, IEEE 12(2), 159–170 (2010)CrossRefGoogle Scholar
  36. 36.
    Zhang, Y., Szabo, C., Sheng, Q.Z.: Sense and focus: Towards effective location inference and event detection on twitter Proceedings of the 16th International Conference on Web Information Systems Engineering Part I, pp 463–477 (2015)Google Scholar
  37. 37.
    Zhang, Y., Szabo, C., Sheng, Q.Z.: Improved object and event monitoring on twitter through lexical analysis and user profiling Proceedings of the 17th International Conference on Web Information System Engineering (2016)Google Scholar
  38. 38.
    Zhang, Y., Szabo, C., Sheng, Q.Z., Fang, X. S.: Classifying perspectives on twitter Immediate observation, affection, and speculation Proceedings of the 16th International Conference on Web Information Systems Engineering Part I, volume, pp 493–507 (2015)Google Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  • Yihong Zhang
    • 1
  • Claudia Szabo
    • 2
  • Quan Z. Sheng
    • 3
  • Xiu Susie Fang
    • 2
  1. 1.School of Computer Science and EngineeringNanyang Technological UniversitySingaporeSingapore
  2. 2.School of Computer ScienceThe University of AdelaideAdelaideAustralia
  3. 3.Department of ComputingMacquarie UniversitySydneyAustralia

Personalised recommendations