Skip to main content

Estimating Time to Event of Future Events Based on Linguistic Cues on Twitter

  • Chapter
  • First Online:
Intelligent Natural Language Processing: Trends and Applications

Part of the book series: Studies in Computational Intelligence ((SCI,volume 740))

Abstract

Given a stream of Twitter messages about an event, we investigate the predictive power of features generated from words and temporal expressions in the messages to estimate the time to event (TTE). From labeled training data average TTE values of the predictive features are learned, so that when they occur in an event-related tweet the TTE estimate can be provided for that tweet. We utilize temporal logic rules and a historical context integration function to improve the TTE estimation precision. In experiments on football matches and music concerts we show that the estimates of the method are off by 4 and 10 h in terms of mean absolute error on average, respectively. We find that the type and size of the event affect the estimation quality. An out-of-domain test on music concerts shows that models and hyperparameters trained and optimized on football matches can be used to estimate the remaining time to concerts. Moreover, mixing in concert events in training improves the precision of the average football event estimate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 299.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://www.twitter.com.

  2. 2.

    http://www.en.wikipedia.org/wiki/Project_X_Haren.

  3. 3.

    http://www.zapaday.com.

  4. 4.

    http://www.daybees.com/.

  5. 5.

    https://www.songkick.com/.

  6. 6.

    https://www.recordedfuture.com.

  7. 7.

    http://www.twiqs.nl.

  8. 8.

    Ajax Amsterdam (aja), Feyenoord Rotterdam (fey), PSV Eindhoven (psv), FC Twente (twe), AZ Alkmaar (az), and FC Utrecht (utr).

  9. 9.

    An analysis of the tweet distribution shows that the 8-day window captures about 98% of all tweets comprises by means of the hashtags that we used.

  10. 10.

    http://www.eredivisie.nl.

  11. 11.

    http://www.lastfm.com.

  12. 12.

    Not all temporal expressions generated by the rules will prove to be correct. Since incorrect items are unlikely to occur and therefore are harmless, we refrained from manually checking the resulting set.

  13. 13.

    Dates, which denote the complete year, month and day of the month, are presently not covered by our patterns but will be added in future.

  14. 14.

    Note that nog does occur on the list as part of various multi-word expressions. Examples are nog twee dagen ‘another two days’ and nog 10 min ‘10 more minutes’.

  15. 15.

    We used Heideltime tagger version 1.7 by enabling the interval tagger and configured NEWS type as genre.

  16. 16.

    This subset is used to optimize the hyper-parameters as well.

  17. 17.

    We used the OpenTaal flexievormen, basis-gekeurd, and basis-ongekeurd word lists from the URL: http://www.opentaal.org/bestanden/doc_download/18-woordenlijst-v-210g-bronbestanden-.

  18. 18.

    The foreign word list currently contains 9 entries: different, indeed, am, ever, field, indeed, more, none, or, wants.

  19. 19.

    http://www.geonames.org/.

  20. 20.

    Features that occur multiple times in a tweet are filtered to have just one occurrence of each feature in a tweet.

  21. 21.

    We used the gaussian_kde method from SciPy v0.14.0 URL: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html.

  22. 22.

    https://www.facebook.com.

References

  1. Baeza Yates, R.: Searching the future. In: In ACM SIGIR Workshop on Mathematical/Formal Methods for Information Retrieval (MF/IR 2005) (2005)

    Google Scholar 

  2. Baldwin, T., Cook, P., Lui, M., MacKinlay, A., Wang, L.: How noisy social media text, how diffrnt social media sources. In: Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), pp. 356–364 (2013)

    Google Scholar 

  3. Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004). https://doi.org/10.1145/1007730.1007735. URL http://doi.acm.org/10.1145/1007730.1007735

  4. Becker, H., Iter, D., Naaman, M., Gravano, L.: Identifying content for planned events across social media sites. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM ’12, pp. 533–542. ACM, New York, USA (2012). https://doi.org/10.1145/2124295.2124360. URL http://doi.acm.org/10.1145/2124295.2124360

  5. Blamey, B., Crick, T., Oatley, G.: ‘The first day of summer’: parsing temporal expressions with distributed semantics. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXX, pp. 389–402. Springer International Publishing (2013). https://doi.org/10.1007/978-3-319-02621-3_29. http://x.doi.org/10.1007/978-3-319-02621-3_29

  6. Briscoe, E., Appling, S., Schlosser, J.: Passive crowd sourcing for technology prediction. In: Agarwal, N., Xu, K., Osgood, N. (eds.) Social Computing, Behavioral-Cultural Modeling, and Prediction. Lecture Notes in Computer Science, vol. 9021, pp. 264–269. Springer International Publishing (2015). https://doi.org/10.1007/978-3-319-16268-3_28. http://dx.doi.org/10.1007/978-3-319-16268-3_28

  7. Chang, A.X., Manning, C.D.: Sutime: a library for recognizing and normalizing time expressions. In: LREC (2012)

    Google Scholar 

  8. Cohen, M.J., Brink, G.J.M., Adang, O.M.J., Dijk, J.A.G.M., Boeschoten, T.: Twee werelden: You only live once. Technical report, Ministerie van Veiligheid en Justitie, The Hague, The Netherlands (2013)

    Google Scholar 

  9. Dias, G., Campos, R., Jorge, A.: Future retrieval: what does the future talk about? In: Proceedings SIGIR2011 Workshop on Enriching Information Retrieval (ENIR2011) (2011)

    Google Scholar 

  10. Hürriyetoğlu, A., Kunneman, F., van den Bosch, A.: Estimating the time between twitter messages and future events. In: DIR, pp. 20–23 (2013)

    Google Scholar 

  11. Hürriyetoğlu, A., Oostdijk, N., van den Bosch, A.: Estimating time to event from tweets using temporal expressions. In: Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM), pp. 8–16. Association for Computational Linguistics, Gothenburg, Sweden (2014). http://www.aclweb.org/anthology/W14-1302

  12. Jatowt, A., Au Yeung, C.m.: Extracting collective expectations about the future from large text collections. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pp. 1259–1264. ACM, New York, USA (2011). https://doi.org/10.1145/2063576.2063759. http://doi.acm.org/10.1145/2063576.2063759

  13. Jatowt, A., Au Yeung, C.M., Tanaka, K.: Estimating document focus time. In: Proceedings of the 22Nd ACM International Conference on Conference on Information & Knowledge Management, CIKM ’13, pp. 2273–2278. ACM, New York, USA (2013). https://doi.org/10.1145/2505515.2505655. http://doi.acm.org/10.1145/2505515.2505655

  14. Kallus, N.: Predicting crowd behavior with big public data. In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, WWW Companion ’14, pp. 625–630. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2014). https://doi.org/10.1145/2567948.2579233. http://dx.doi.org/10.1145/2567948.2579233

  15. Kanhabua, N., Romano, S., Stewart, A.: Identifying relevant temporal expressions for real-world events. In: Proceedings of The SIGIR 2012 Workshop on Time-aware Information Access, Portland, OR (2012)

    Google Scholar 

  16. Kawai, H., Jatowt, A., Tanaka, K., Kunieda, K., Yamada, K.: Chronoseeker: search engine for future and past events. In: Proceedings of the 4th International Conference on Uniquitous Information Management and Communication, ICUIMC ’10, pp. 25:1–25:10. ACM, New York, USA (2010). https://doi.org/10.1145/2108616.2108647. http://doi.acm.org/10.1145/2108616.2108647

  17. Kunneman, F., Van den Bosch, A.: Leveraging unscheduled event prediction through mining scheduled event tweets. In: Roos, N., Winands, M., Uiterwijk, J. (eds.) Proceedings of the 24th Benelux Conference on Artficial Intelligence, pp. 147–154. Maastricht, The Netherlands (2012)

    Google Scholar 

  18. Lee, H., Surdeanu, M., MacCartney, B., Jurafsky, D.: On the importance of text analysis for stock price prediction. In: Proceedings of LREC 2014 (2014)

    Google Scholar 

  19. Mani, I., Wilson, G.: Robust temporal processing of news. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL ’00, pp. 69–76. Association for Computational Linguistics, Stroudsburg, PA, USA (2000). https://doi.org/10.3115/1075218.1075228. http://dx.doi.org/10.3115/1075218.1075228

  20. Morency, P.: When temporal expressions don’t tell time: a pragmatic approach to temporality, argumentation and subjectivity (2006). https://www2.unine.ch/files/content/sites/cognition/files/shared/documents/patrickmorency-thesisproject.pdf

  21. Muthiah, S.: Forecasting protests by detecting future time mentions in news and social media. Master’s thesis, Virginia Polytechnic Institute and State University (2014). http://vtechworks.lib.vt.edu/handle/10919/25430

  22. Nakajima, Y., Ptaszynski, M., Honma, H., Masui, F.: Investigation of future reference expressions in trend information. In: 2014 AAAI Spring Symposium Series, pp. 32–38 (2014). http://www.aaai.org/ocs/index.php/SSS/SSS14/paper/view/7691

  23. Nguyen-Son, H.Q., Hoang, A.T., Tran, M.T., Yoshiura, H., Sonehara, N., Echizen, I.: Anonymizing temporal phrases in natural language text to be posted on social networking services. In: Shi, Y.Q., Kim, H.J., Prez-Gonzlez, F. (eds.) Digital-Forensics and Watermarking. Lecture Notes in Computer Science, pp. 437–451. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43886-2_31. http://dx.doi.org/10.1007/978-3-662-43886-2_31

  24. Noce, L., Zamberletti, A., Gallo, I., Piccoli, G., Rodriguez, J.: Automatic prediction of future business conditions. In: Przepirkowski, A., Ogrodniczuk, M. (eds.) Advances in Natural Language Processing. Lecture Notes in Computer Science, vol. 8686, pp. 371–383. Springer International Publishing (2014). https://doi.org/10.1007/978-3-319-10888-9_37. http://dx.doi.org/10.1007/978-3-319-10888-9_37

  25. Noro, T., Inui, T., Takamura, H., Okumura, M.: Time period identification of events in text. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, pp. 1153–1160. Association for Computational Linguistics, Stroudsburg, PA, USA (2006). https://doi.org/10.3115/1220175.1220320. http://dx.doi.org/10.3115/1220175.1220320

  26. Ozdikis, O., Senkul, P., Oguztuzun, H.: Semantic expansion of hashtags for enhanced event detection in twitter. In: Proceedings of the 1st International Workshop on Online Social Systems (2012)

    Google Scholar 

  27. Papacharalampous, A.E., Cats, O., Lankhaar, J.W., Daamen, W., Van Lint, H.: Multi-modal data fusion for big events. In: Transportation Research Board 95th Annual Meeting, 16-2267 (2016). https://trid.trb.org/view.aspx?id=1392844

  28. Radinsky, K., Davidovich, S., Markovitch, S.: Learning causality for news events prediction. In: Proceedings of the 21st International Conference on World Wide Web, WWW ’12, pp. 909–918. ACM, New York, USA (2012). https://doi.org/10.1145/2187836.2187958. http://dx.doi.org/10.1145/2187836.2187958

  29. Ramakrishnan, N., Butler, P., Muthiah, S., Self, N., Khandpur, R., Saraf, P., Wang, W., Cadena, J., Vullikanti, A., Korkmaz, G., Kuhlman, C.J., Marathe, A., Zhao, L., Hua, T., Chen, F., Lu, C.T., Huang, B., Srinivasan, A., Trinh, K., Getoor, L., Katz, G., Doyle, A., Ackermann, C., Zavorin, I., Ford, J., Summers, K.M., Fayed, Y., Arredondo, J., Gupta, D., Mares, D.: ‘beating the news’ with embers: forecasting civil unrest using open source indicators. CoRR abs/1402.7035 (2014)

    Google Scholar 

  30. Redd, A., Carter, M., Divita, G., Shen, S., Palmer, M., Samore, M., Gundlapalli, A.V.: Detecting earlier indicators of homelessness in the free text of medical records. Stud. Health Technol. Inform. 202, 153–156 (2013)

    Google Scholar 

  31. Ritter, A., Mausam, Etzioni, O., Clark, S.: Open domain event extraction from twitter. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp. 1104–1112. ACM, New York, USA (2012). https://doi.org/10.1145/2339530.2339704. http://dx.doi.org/10.1145/2339530.2339704

  32. Roitman, H., Mamou, J., Mehta, S., Satt, A., Subramaniam, L.: Harnessing the crowds for smart city sensing. In: Proceedings of the 1st International Workshop on Multimodal Crowd Sensing, CrowdSens ’12, pp. 17–18. ACM, New York, USA (2012). https://doi.org/10.1145/2390034.2390043. http://doi.acm.org/10.1145/2390034.2390043

  33. Strötgen, J., Alonso, O., Gertz, M.: Identification of top relevant temporal expressions in documents. In: Proceedings of the 2Nd Temporal Web Analytics Workshop, TempWeb ’12, pp. 33–40. ACM, New York, USA (2012). https://doi.org/10.1145/2169095.2169102. http://doi.acm.org/10.1145/2169095.2169102

  34. Strötgen, J., Gertz, M.: Multilingual and cross-domain temporal tagging. Language Resources and Evaluation 47(2), 269–298 (2013). https://doi.org/10.1007/s10579-012-9179-y. http://dx.doi.org/10.1007/s10579-012-9179-y

  35. Tjong Kim Sang, E., van den Bosch, A.: Dealing with big data: the case of twitter. Comput. Linguist. Netherlands J 3, 121–134 (2013)

    Google Scholar 

  36. Tops, H., van den Bosch, A., Kunneman, F.: Predicting time-to-event from twitter messages. In: BNAIC 2013 The 24th Benelux Conference on Artificial Intelligence, pp. 207–2014 (2013)

    Google Scholar 

  37. Tufekci, Z.: Big questions for social media big data: representativeness, validity and other methodological pitfalls. In: Adar, E., Resnick, P., Choudhury, M.D., Hogan, B., Oh, A. (eds.) Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, 1–4 June 2014. The AAAI Press (2014). http://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8062

  38. Wang, X., Tokarchuk, L., Cuadrado, F., Poslad, S.: Exploiting hashtags for adaptive microblog crawling. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’13, pp. 311–315. ACM, New York, USA (2013). https://doi.org/10.1145/2492517.2492624. http://doi.acm.org/10.1145/2492517.2492624

  39. Weerkamp, W., De Rijke, M.: Activity prediction: A twitter-based exploration. In: Proceedings of the SIGIR 2012 Workshop on Time-aware Information Access, TAIA-2012 (2012)

    Google Scholar 

Download references

Acknowledgements

This research was supported by the Dutch national programme COMMIT as part of the Infiniti project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ali Hürriyetoǧlu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Hürriyetoǧlu, A., Oostdijk, N., van den Bosch, A. (2018). Estimating Time to Event of Future Events Based on Linguistic Cues on Twitter. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67056-0_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67055-3

  • Online ISBN: 978-3-319-67056-0

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics