Estimating Time to Event of Future Events Based on Linguistic Cues on Twitter

Hürriyetoǧlu, Ali; Oostdijk, Nelleke; van den Bosch, Antal

doi:10.1007/978-3-319-67056-0_5

Ali Hürriyetoǧlu⁵,
Nelleke Oostdijk⁵ &
Antal van den Bosch⁵

Part of the book series: Studies in Computational Intelligence ((SCI,volume 740))

3404 Accesses
2 Citations

Abstract

Given a stream of Twitter messages about an event, we investigate the predictive power of features generated from words and temporal expressions in the messages to estimate the time to event (TTE). From labeled training data average TTE values of the predictive features are learned, so that when they occur in an event-related tweet the TTE estimate can be provided for that tweet. We utilize temporal logic rules and a historical context integration function to improve the TTE estimation precision. In experiments on football matches and music concerts we show that the estimates of the method are off by 4 and 10 h in terms of mean absolute error on average, respectively. We find that the type and size of the event affect the estimation quality. An out-of-domain test on music concerts shows that models and hyperparameters trained and optimized on football matches can be used to estimate the remaining time to concerts. Moreover, mixing in concert events in training improves the precision of the average football event estimate.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Hardcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://www.twitter.com.
2.
http://www.en.wikipedia.org/wiki/Project_X_Haren.
3.
http://www.zapaday.com.
4.
http://www.daybees.com/.
5.
https://www.songkick.com/.
6.
https://www.recordedfuture.com.
7.
http://www.twiqs.nl.
8.
Ajax Amsterdam (aja), Feyenoord Rotterdam (fey), PSV Eindhoven (psv), FC Twente (twe), AZ Alkmaar (az), and FC Utrecht (utr).
9.
An analysis of the tweet distribution shows that the 8-day window captures about 98% of all tweets comprises by means of the hashtags that we used.
10.
http://www.eredivisie.nl.
11.
http://www.lastfm.com.
12.
Not all temporal expressions generated by the rules will prove to be correct. Since incorrect items are unlikely to occur and therefore are harmless, we refrained from manually checking the resulting set.
13.
Dates, which denote the complete year, month and day of the month, are presently not covered by our patterns but will be added in future.
14.
Note that nog does occur on the list as part of various multi-word expressions. Examples are nog twee dagen ‘another two days’ and nog 10 min ‘10 more minutes’.
15.
We used Heideltime tagger version 1.7 by enabling the interval tagger and configured NEWS type as genre.
16.
This subset is used to optimize the hyper-parameters as well.
17.
We used the OpenTaal flexievormen, basis-gekeurd, and basis-ongekeurd word lists from the URL: http://www.opentaal.org/bestanden/doc_download/18-woordenlijst-v-210g-bronbestanden-.
18.
The foreign word list currently contains 9 entries: different, indeed, am, ever, field, indeed, more, none, or, wants.
19.
http://www.geonames.org/.
20.
Features that occur multiple times in a tweet are filtered to have just one occurrence of each feature in a tweet.
21.
We used the gaussian_kde method from SciPy v0.14.0 URL: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.gaussian_kde.html.
22.
https://www.facebook.com.

References

Baeza Yates, R.: Searching the future. In: In ACM SIGIR Workshop on Mathematical/Formal Methods for Information Retrieval (MF/IR 2005) (2005)
Google Scholar
Baldwin, T., Cook, P., Lui, M., MacKinlay, A., Wang, L.: How noisy social media text, how diffrnt social media sources. In: Proceedings of the 6th International Joint Conference on Natural Language Processing (IJCNLP 2013), pp. 356–364 (2013)
Google Scholar
Batista, G.E.A.P.A., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. SIGKDD Explor. Newsl. 6(1), 20–29 (2004). https://doi.org/10.1145/1007730.1007735. URL http://doi.acm.org/10.1145/1007730.1007735
Becker, H., Iter, D., Naaman, M., Gravano, L.: Identifying content for planned events across social media sites. In: Proceedings of the Fifth ACM International Conference on Web Search and Data Mining, WSDM ’12, pp. 533–542. ACM, New York, USA (2012). https://doi.org/10.1145/2124295.2124360. URL http://doi.acm.org/10.1145/2124295.2124360
Blamey, B., Crick, T., Oatley, G.: ‘The first day of summer’: parsing temporal expressions with distributed semantics. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXX, pp. 389–402. Springer International Publishing (2013). https://doi.org/10.1007/978-3-319-02621-3_29. http://x.doi.org/10.1007/978-3-319-02621-3_29
Briscoe, E., Appling, S., Schlosser, J.: Passive crowd sourcing for technology prediction. In: Agarwal, N., Xu, K., Osgood, N. (eds.) Social Computing, Behavioral-Cultural Modeling, and Prediction. Lecture Notes in Computer Science, vol. 9021, pp. 264–269. Springer International Publishing (2015). https://doi.org/10.1007/978-3-319-16268-3_28. http://dx.doi.org/10.1007/978-3-319-16268-3_28
Chang, A.X., Manning, C.D.: Sutime: a library for recognizing and normalizing time expressions. In: LREC (2012)
Google Scholar
Cohen, M.J., Brink, G.J.M., Adang, O.M.J., Dijk, J.A.G.M., Boeschoten, T.: Twee werelden: You only live once. Technical report, Ministerie van Veiligheid en Justitie, The Hague, The Netherlands (2013)
Google Scholar
Dias, G., Campos, R., Jorge, A.: Future retrieval: what does the future talk about? In: Proceedings SIGIR2011 Workshop on Enriching Information Retrieval (ENIR2011) (2011)
Google Scholar
Hürriyetoğlu, A., Kunneman, F., van den Bosch, A.: Estimating the time between twitter messages and future events. In: DIR, pp. 20–23 (2013)
Google Scholar
Hürriyetoğlu, A., Oostdijk, N., van den Bosch, A.: Estimating time to event from tweets using temporal expressions. In: Proceedings of the 5th Workshop on Language Analysis for Social Media (LASM), pp. 8–16. Association for Computational Linguistics, Gothenburg, Sweden (2014). http://www.aclweb.org/anthology/W14-1302
Jatowt, A., Au Yeung, C.m.: Extracting collective expectations about the future from large text collections. In: Proceedings of the 20th ACM International Conference on Information and Knowledge Management, CIKM ’11, pp. 1259–1264. ACM, New York, USA (2011). https://doi.org/10.1145/2063576.2063759. http://doi.acm.org/10.1145/2063576.2063759
Jatowt, A., Au Yeung, C.M., Tanaka, K.: Estimating document focus time. In: Proceedings of the 22Nd ACM International Conference on Conference on Information & Knowledge Management, CIKM ’13, pp. 2273–2278. ACM, New York, USA (2013). https://doi.org/10.1145/2505515.2505655. http://doi.acm.org/10.1145/2505515.2505655
Kallus, N.: Predicting crowd behavior with big public data. In: Proceedings of the Companion Publication of the 23rd International Conference on World Wide Web Companion, WWW Companion ’14, pp. 625–630. International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland (2014). https://doi.org/10.1145/2567948.2579233. http://dx.doi.org/10.1145/2567948.2579233
Kanhabua, N., Romano, S., Stewart, A.: Identifying relevant temporal expressions for real-world events. In: Proceedings of The SIGIR 2012 Workshop on Time-aware Information Access, Portland, OR (2012)
Google Scholar
Kawai, H., Jatowt, A., Tanaka, K., Kunieda, K., Yamada, K.: Chronoseeker: search engine for future and past events. In: Proceedings of the 4th International Conference on Uniquitous Information Management and Communication, ICUIMC ’10, pp. 25:1–25:10. ACM, New York, USA (2010). https://doi.org/10.1145/2108616.2108647. http://doi.acm.org/10.1145/2108616.2108647
Kunneman, F., Van den Bosch, A.: Leveraging unscheduled event prediction through mining scheduled event tweets. In: Roos, N., Winands, M., Uiterwijk, J. (eds.) Proceedings of the 24th Benelux Conference on Artficial Intelligence, pp. 147–154. Maastricht, The Netherlands (2012)
Google Scholar
Lee, H., Surdeanu, M., MacCartney, B., Jurafsky, D.: On the importance of text analysis for stock price prediction. In: Proceedings of LREC 2014 (2014)
Google Scholar
Mani, I., Wilson, G.: Robust temporal processing of news. In: Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL ’00, pp. 69–76. Association for Computational Linguistics, Stroudsburg, PA, USA (2000). https://doi.org/10.3115/1075218.1075228. http://dx.doi.org/10.3115/1075218.1075228
Morency, P.: When temporal expressions don’t tell time: a pragmatic approach to temporality, argumentation and subjectivity (2006). https://www2.unine.ch/files/content/sites/cognition/files/shared/documents/patrickmorency-thesisproject.pdf
Muthiah, S.: Forecasting protests by detecting future time mentions in news and social media. Master’s thesis, Virginia Polytechnic Institute and State University (2014). http://vtechworks.lib.vt.edu/handle/10919/25430
Nakajima, Y., Ptaszynski, M., Honma, H., Masui, F.: Investigation of future reference expressions in trend information. In: 2014 AAAI Spring Symposium Series, pp. 32–38 (2014). http://www.aaai.org/ocs/index.php/SSS/SSS14/paper/view/7691
Nguyen-Son, H.Q., Hoang, A.T., Tran, M.T., Yoshiura, H., Sonehara, N., Echizen, I.: Anonymizing temporal phrases in natural language text to be posted on social networking services. In: Shi, Y.Q., Kim, H.J., Prez-Gonzlez, F. (eds.) Digital-Forensics and Watermarking. Lecture Notes in Computer Science, pp. 437–451. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-43886-2_31. http://dx.doi.org/10.1007/978-3-662-43886-2_31
Noce, L., Zamberletti, A., Gallo, I., Piccoli, G., Rodriguez, J.: Automatic prediction of future business conditions. In: Przepirkowski, A., Ogrodniczuk, M. (eds.) Advances in Natural Language Processing. Lecture Notes in Computer Science, vol. 8686, pp. 371–383. Springer International Publishing (2014). https://doi.org/10.1007/978-3-319-10888-9_37. http://dx.doi.org/10.1007/978-3-319-10888-9_37
Noro, T., Inui, T., Takamura, H., Okumura, M.: Time period identification of events in text. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, ACL-44, pp. 1153–1160. Association for Computational Linguistics, Stroudsburg, PA, USA (2006). https://doi.org/10.3115/1220175.1220320. http://dx.doi.org/10.3115/1220175.1220320
Ozdikis, O., Senkul, P., Oguztuzun, H.: Semantic expansion of hashtags for enhanced event detection in twitter. In: Proceedings of the 1st International Workshop on Online Social Systems (2012)
Google Scholar
Papacharalampous, A.E., Cats, O., Lankhaar, J.W., Daamen, W., Van Lint, H.: Multi-modal data fusion for big events. In: Transportation Research Board 95th Annual Meeting, 16-2267 (2016). https://trid.trb.org/view.aspx?id=1392844
Radinsky, K., Davidovich, S., Markovitch, S.: Learning causality for news events prediction. In: Proceedings of the 21st International Conference on World Wide Web, WWW ’12, pp. 909–918. ACM, New York, USA (2012). https://doi.org/10.1145/2187836.2187958. http://dx.doi.org/10.1145/2187836.2187958
Ramakrishnan, N., Butler, P., Muthiah, S., Self, N., Khandpur, R., Saraf, P., Wang, W., Cadena, J., Vullikanti, A., Korkmaz, G., Kuhlman, C.J., Marathe, A., Zhao, L., Hua, T., Chen, F., Lu, C.T., Huang, B., Srinivasan, A., Trinh, K., Getoor, L., Katz, G., Doyle, A., Ackermann, C., Zavorin, I., Ford, J., Summers, K.M., Fayed, Y., Arredondo, J., Gupta, D., Mares, D.: ‘beating the news’ with embers: forecasting civil unrest using open source indicators. CoRR abs/1402.7035 (2014)
Google Scholar
Redd, A., Carter, M., Divita, G., Shen, S., Palmer, M., Samore, M., Gundlapalli, A.V.: Detecting earlier indicators of homelessness in the free text of medical records. Stud. Health Technol. Inform. 202, 153–156 (2013)
Google Scholar
Ritter, A., Mausam, Etzioni, O., Clark, S.: Open domain event extraction from twitter. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’12, pp. 1104–1112. ACM, New York, USA (2012). https://doi.org/10.1145/2339530.2339704. http://dx.doi.org/10.1145/2339530.2339704
Roitman, H., Mamou, J., Mehta, S., Satt, A., Subramaniam, L.: Harnessing the crowds for smart city sensing. In: Proceedings of the 1st International Workshop on Multimodal Crowd Sensing, CrowdSens ’12, pp. 17–18. ACM, New York, USA (2012). https://doi.org/10.1145/2390034.2390043. http://doi.acm.org/10.1145/2390034.2390043
Strötgen, J., Alonso, O., Gertz, M.: Identification of top relevant temporal expressions in documents. In: Proceedings of the 2Nd Temporal Web Analytics Workshop, TempWeb ’12, pp. 33–40. ACM, New York, USA (2012). https://doi.org/10.1145/2169095.2169102. http://doi.acm.org/10.1145/2169095.2169102
Strötgen, J., Gertz, M.: Multilingual and cross-domain temporal tagging. Language Resources and Evaluation 47(2), 269–298 (2013). https://doi.org/10.1007/s10579-012-9179-y. http://dx.doi.org/10.1007/s10579-012-9179-y
Tjong Kim Sang, E., van den Bosch, A.: Dealing with big data: the case of twitter. Comput. Linguist. Netherlands J 3, 121–134 (2013)
Google Scholar
Tops, H., van den Bosch, A., Kunneman, F.: Predicting time-to-event from twitter messages. In: BNAIC 2013 The 24th Benelux Conference on Artificial Intelligence, pp. 207–2014 (2013)
Google Scholar
Tufekci, Z.: Big questions for social media big data: representativeness, validity and other methodological pitfalls. In: Adar, E., Resnick, P., Choudhury, M.D., Hogan, B., Oh, A. (eds.) Proceedings of the Eighth International Conference on Weblogs and Social Media, ICWSM 2014, Ann Arbor, Michigan, USA, 1–4 June 2014. The AAAI Press (2014). http://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8062
Wang, X., Tokarchuk, L., Cuadrado, F., Poslad, S.: Exploiting hashtags for adaptive microblog crawling. In: Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’13, pp. 311–315. ACM, New York, USA (2013). https://doi.org/10.1145/2492517.2492624. http://doi.acm.org/10.1145/2492517.2492624
Weerkamp, W., De Rijke, M.: Activity prediction: A twitter-based exploration. In: Proceedings of the SIGIR 2012 Workshop on Time-aware Information Access, TAIA-2012 (2012)
Google Scholar

Download references

Acknowledgements

This research was supported by the Dutch national programme COMMIT as part of the Infiniti project.

Author information

Authors and Affiliations

Centre for Language Studies, Radboud University, P.O. Box 9103, 6500 HD, Nijmegen, The Netherlands
Ali Hürriyetoǧlu, Nelleke Oostdijk & Antal van den Bosch

Authors

Ali Hürriyetoǧlu
View author publications
You can also search for this author in PubMed Google Scholar
Nelleke Oostdijk
View author publications
You can also search for this author in PubMed Google Scholar
Antal van den Bosch
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Hürriyetoǧlu .

Editor information

Editors and Affiliations

The British University in Dubai, Dubai, United Arab Emirates
Khaled Shaalan
Faculty of Computers and Information Technology, Cairo University, Giza, Egypt
Aboul Ella Hassanien
Faculty of Computers and Information, Ain Shams University, Cairo, Egypt
Fahmy Tolba

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Hürriyetoǧlu, A., Oostdijk, N., van den Bosch, A. (2018). Estimating Time to Event of Future Events Based on Linguistic Cues on Twitter. In: Shaalan, K., Hassanien, A., Tolba, F. (eds) Intelligent Natural Language Processing: Trends and Applications. Studies in Computational Intelligence, vol 740. Springer, Cham. https://doi.org/10.1007/978-3-319-67056-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-67056-0_5
Published: 18 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67055-3
Online ISBN: 978-3-319-67056-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics