Abstract
Given a stream of Twitter messages about an event, we investigate the predictive power of features generated from words and temporal expressions in the messages to estimate the time to event (TTE). From labeled training data average TTE values of the predictive features are learned, so that when they occur in an event-related tweet the TTE estimate can be provided for that tweet. We utilize temporal logic rules and a historical context integration function to improve the TTE estimation precision. In experiments on football matches and music concerts we show that the estimates of the method are off by 4 and 10 h in terms of mean absolute error on average, respectively. We find that the type and size of the event affect the estimation quality. An out-of-domain test on music concerts shows that models and hyperparameters trained and optimized on football matches can be used to estimate the remaining time to concerts. Moreover, mixing in concert events in training improves the precision of the average football event estimate.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
Ajax Amsterdam (aja), Feyenoord Rotterdam (fey), PSV Eindhoven (psv), FC Twente (twe), AZ Alkmaar (az), and FC Utrecht (utr).
- 9.
An analysis of the tweet distribution shows that the 8-day window captures about 98% of all tweets comprises by means of the hashtags that we used.
- 10.
- 11.
- 12.
Not all temporal expressions generated by the rules will prove to be correct. Since incorrect items are unlikely to occur and therefore are harmless, we refrained from manually checking the resulting set.
- 13.
Dates, which denote the complete year, month and day of the month, are presently not covered by our patterns but will be added in future.
- 14.
Note that nog does occur on the list as part of various multi-word expressions. Examples are nog twee dagen ‘another two days’ and nog 10 min ‘10 more minutes’.
- 15.
We used Heideltime tagger version 1.7 by enabling the interval tagger and configured NEWS type as genre.
- 16.
This subset is used to optimize the hyper-parameters as well.
- 17.
We used the OpenTaal flexievormen, basis-gekeurd, and basis-ongekeurd word lists from the URL: http://www.opentaal.org/bestanden/doc_download/18-woordenlijst-v-210g-bronbestanden-.
- 18.
The foreign word list currently contains 9 entries: different, indeed, am, ever, field, indeed, more, none, or, wants.
- 19.
- 20.
Features that occur multiple times in a tweet are filtered to have just one occurrence of each feature in a tweet.
- 21.
We used the gaussian_kde method from SciPy v0.14.0 URL: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html.
- 22.
References
Baeza Yates, R.: Searching the future. In: In ACM SIGIR Workshop on Mathematical/Formal Methods for Information Retrieval (MF/IR 2005) (2005)
Baldwin, T., Cook, P., Lui, M., MacKinlay, A., Wang, L.: How noisy social media text, how diffrnt social media sources. In: Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), pp. 356–364 (2013)
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004). https://doi.org/10.1145/1007730.1007735. URL http://doi.acm.org/10.1145/1007730.1007735
Becker, H., Iter, D., Naaman, M., Gravano, L.: Identifying content for planned events across social media sites. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM ’12, pp. 533–542. ACM, New York, USA (2012). https://doi.org/10.1145/2124295.2124360. URL http://doi.acm.org/10.1145/2124295.2124360
Blamey, B., Crick, T., Oatley, G.: ‘The first day of summer’: parsing temporal expressions with distributed semantics. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXX, pp. 389–402. Springer International Publishing (2013). https://doi.org/10.1007/978-3-319-02621-3_29. http://x.doi.org/10.1007/978-3-319-02621-3_29
Briscoe, E., Appling, S., Schlosser, J.: Passive crowd sourcing for technology prediction. In: Agarwal, N., Xu, K., Osgood, N. (eds.) Social Computing, Behavioral-Cultural Modeling, and Prediction. Lecture Notes in Computer Science, vol. 9021, pp. 264–269. Springer International Publishing (2015). https://doi.org/10.1007/978-3-319-16268-3_28. http://dx.doi.org/10.1007/978-3-319-16268-3_28
Chang, A.X., Manning, C.D.: Sutime: a library for recognizing and normalizing time expressions. In: LREC (2012)
Cohen, M.J., Brink, G.J.M., Adang, O.M.J., Dijk, J.A.G.M., Boeschoten, T.: Twee werelden: You only live once. Technical report, Ministerie van Veiligheid en Justitie, The Hague, The Netherlands (2013)
Dias, G., Campos, R., Jorge, A.: Future retrieval: what does the future talk about? In: Proceedings SIGIR2011 Workshop on Enriching Information Retrieval (ENIR2011) (2011)
Hürriyetoğlu, A., Kunneman, F., van den Bosch, A.: Estimating the time between twitter messages and future events. In: DIR, pp. 20–23 (2013)
Hürriyetoğlu, A., Oostdijk, N., van den Bosch, A.: Estimating time to event from tweets using temporal expressions. In: Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM), pp. 8–16. Association for Computational Linguistics, Gothenburg, Sweden (2014). http://www.aclweb.org/anthology/W14-1302
Jatowt, A., Au Yeung, C.m.: Extracting collective expectations about the future from large text collections. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pp. 1259–1264. ACM, New York, USA (2011). https://doi.org/10.1145/2063576.2063759. http://doi.acm.org/10.1145/2063576.2063759
Jatowt, A., Au Yeung, C.M., Tanaka, K.: Estimating document focus time. In: Proceedings of the 22Nd ACM International Conference on Conference on Information & Knowledge Management, CIKM ’13, pp. 2273–2278. ACM, New York, USA (2013). https://doi.org/10.1145/2505515.2505655. http://doi.acm.org/10.1145/2505515.2505655
Kallus, N.: Predicting crowd behavior with big public data. In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, WWW Companion ’14, pp. 625–630. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2014). https://doi.org/10.1145/2567948.2579233. http://dx.doi.org/10.1145/2567948.2579233
Kanhabua, N., Romano, S., Stewart, A.: Identifying relevant temporal expressions for real-world events. In: Proceedings of The SIGIR 2012 Workshop on Time-aware Information Access, Portland, OR (2012)
Kawai, H., Jatowt, A., Tanaka, K., Kunieda, K., Yamada, K.: Chronoseeker: search engine for future and past events. In: Proceedings of the 4th International Conference on Uniquitous Information Management and Communication, ICUIMC ’10, pp. 25:1–25:10. ACM, New York, USA (2010). https://doi.org/10.1145/2108616.2108647. http://doi.acm.org/10.1145/2108616.2108647
Kunneman, F., Van den Bosch, A.: Leveraging unscheduled event prediction through mining scheduled event tweets. In: Roos, N., Winands, M., Uiterwijk, J. (eds.) Proceedings of the 24th Benelux Conference on Artficial Intelligence, pp. 147–154. Maastricht, The Netherlands (2012)
Lee, H., Surdeanu, M., MacCartney, B., Jurafsky, D.: On the importance of text analysis for stock price prediction. In: Proceedings of LREC 2014 (2014)
Mani, I., Wilson, G.: Robust temporal processing of news. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL ’00, pp. 69–76. Association for Computational Linguistics, Stroudsburg, PA, USA (2000). https://doi.org/10.3115/1075218.1075228. http://dx.doi.org/10.3115/1075218.1075228
Morency, P.: When temporal expressions don’t tell time: a pragmatic approach to temporality, argumentation and subjectivity (2006). https://www2.unine.ch/files/content/sites/cognition/files/shared/documents/patrickmorency-thesisproject.pdf
Muthiah, S.: Forecasting protests by detecting future time mentions in news and social media. Master’s thesis, Virginia Polytechnic Institute and State University (2014). http://vtechworks.lib.vt.edu/handle/10919/25430
Nakajima, Y., Ptaszynski, M., Honma, H., Masui, F.: Investigation of future reference expressions in trend information. In: 2014 AAAI Spring Symposium Series, pp. 32–38 (2014). http://www.aaai.org/ocs/index.php/SSS/SSS14/paper/view/7691
Nguyen-Son, H.Q., Hoang, A.T., Tran, M.T., Yoshiura, H., Sonehara, N., Echizen, I.: Anonymizing temporal phrases in natural language text to be posted on social networking services. In: Shi, Y.Q., Kim, H.J., Prez-Gonzlez, F. (eds.) Digital-Forensics and Watermarking. Lecture Notes in Computer Science, pp. 437–451. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43886-2_31. http://dx.doi.org/10.1007/978-3-662-43886-2_31
Noce, L., Zamberletti, A., Gallo, I., Piccoli, G., Rodriguez, J.: Automatic prediction of future business conditions. In: Przepirkowski, A., Ogrodniczuk, M. (eds.) Advances in Natural Language Processing. Lecture Notes in Computer Science, vol. 8686, pp. 371–383. Springer International Publishing (2014). https://doi.org/10.1007/978-3-319-10888-9_37. http://dx.doi.org/10.1007/978-3-319-10888-9_37
Noro, T., Inui, T., Takamura, H., Okumura, M.: Time period identification of events in text. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, pp. 1153–1160. Association for Computational Linguistics, Stroudsburg, PA, USA (2006). https://doi.org/10.3115/1220175.1220320. http://dx.doi.org/10.3115/1220175.1220320
Ozdikis, O., Senkul, P., Oguztuzun, H.: Semantic expansion of hashtags for enhanced event detection in twitter. In: Proceedings of the 1st International Workshop on Online Social Systems (2012)
Papacharalampous, A.E., Cats, O., Lankhaar, J.W., Daamen, W., Van Lint, H.: Multi-modal data fusion for big events. In: Transportation Research Board 95th Annual Meeting, 16-2267 (2016). https://trid.trb.org/view.aspx?id=1392844
Radinsky, K., Davidovich, S., Markovitch, S.: Learning causality for news events prediction. In: Proceedings of the 21st International Conference on World Wide Web, WWW ’12, pp. 909–918. ACM, New York, USA (2012). https://doi.org/10.1145/2187836.2187958. http://dx.doi.org/10.1145/2187836.2187958
Ramakrishnan, N., Butler, P., Muthiah, S., Self, N., Khandpur, R., Saraf, P., Wang, W., Cadena, J., Vullikanti, A., Korkmaz, G., Kuhlman, C.J., Marathe, A., Zhao, L., Hua, T., Chen, F., Lu, C.T., Huang, B., Srinivasan, A., Trinh, K., Getoor, L., Katz, G., Doyle, A., Ackermann, C., Zavorin, I., Ford, J., Summers, K.M., Fayed, Y., Arredondo, J., Gupta, D., Mares, D.: ‘beating the news’ with embers: forecasting civil unrest using open source indicators. CoRR abs/1402.7035 (2014)
Redd, A., Carter, M., Divita, G., Shen, S., Palmer, M., Samore, M., Gundlapalli, A.V.: Detecting earlier indicators of homelessness in the free text of medical records. Stud. Health Technol. Inform. 202, 153–156 (2013)
Ritter, A., Mausam, Etzioni, O., Clark, S.: Open domain event extraction from twitter. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp. 1104–1112. ACM, New York, USA (2012). https://doi.org/10.1145/2339530.2339704. http://dx.doi.org/10.1145/2339530.2339704
Roitman, H., Mamou, J., Mehta, S., Satt, A., Subramaniam, L.: Harnessing the crowds for smart city sensing. In: Proceedings of the 1st International Workshop on Multimodal Crowd Sensing, CrowdSens ’12, pp. 17–18. ACM, New York, USA (2012). https://doi.org/10.1145/2390034.2390043. http://doi.acm.org/10.1145/2390034.2390043
Strötgen, J., Alonso, O., Gertz, M.: Identification of top relevant temporal expressions in documents. In: Proceedings of the 2Nd Temporal Web Analytics Workshop, TempWeb ’12, pp. 33–40. ACM, New York, USA (2012). https://doi.org/10.1145/2169095.2169102. http://doi.acm.org/10.1145/2169095.2169102
Strötgen, J., Gertz, M.: Multilingual and cross-domain temporal tagging. Language Resources and Evaluation 47(2), 269–298 (2013). https://doi.org/10.1007/s10579-012-9179-y. http://dx.doi.org/10.1007/s10579-012-9179-y
Tjong Kim Sang, E., van den Bosch, A.: Dealing with big data: the case of twitter. Comput. Linguist. Netherlands J 3, 121–134 (2013)
Tops, H., van den Bosch, A., Kunneman, F.: Predicting time-to-event from twitter messages. In: BNAIC 2013 The 24th Benelux Conference on Artificial Intelligence, pp. 207–2014 (2013)
Tufekci, Z.: Big questions for social media big data: representativeness, validity and other methodological pitfalls. In: Adar, E., Resnick, P., Choudhury, M.D., Hogan, B., Oh, A. (eds.) Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, 1–4 June 2014. The AAAI Press (2014). http://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8062
Wang, X., Tokarchuk, L., Cuadrado, F., Poslad, S.: Exploiting hashtags for adaptive microblog crawling. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’13, pp. 311–315. ACM, New York, USA (2013). https://doi.org/10.1145/2492517.2492624. http://doi.acm.org/10.1145/2492517.2492624
Weerkamp, W., De Rijke, M.: Activity prediction: A twitter-based exploration. In: Proceedings of the SIGIR 2012 Workshop on Time-aware Information Access, TAIA-2012 (2012)
Acknowledgements
This research was supported by the Dutch national programme COMMIT as part of the Infiniti project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Hürriyetoǧlu, A., Oostdijk, N., van den Bosch, A. (2018). Estimating Time to Event of Future Events Based on Linguistic Cues on Twitter. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-319-67056-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67055-3
Online ISBN: 978-3-319-67056-0
eBook Packages: EngineeringEngineering (R0)