Giveme5W: Main Event Retrieval from News Articles by Extraction of the Five Journalistic W Questions

  • Felix HamborgEmail author
  • Soeren Lachnit
  • Moritz Schubotz
  • Thomas Hepp
  • Bela Gipp
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10766)


Extraction of event descriptors from news articles is a commonly required task for various tasks, such as clustering related articles, summarization, and news aggregation. Due to the lack of generally usable and publicly available methods optimized for news, many researchers must redundantly implement such methods for their project. Answers to the five journalistic W questions (5Ws) describe the main event of a news article, i.e., who did what, when, where, and why. The main contribution of this paper is Giveme5W, the first open-source, syntax-based 5W extraction system for news articles. The system retrieves an article’s main event by extracting phrases that answer the journalistic 5Ws. In an evaluation with three assessors and 60 articles, we find that the extraction precision of 5W phrases is \( p = 0.7 \).


News event detection 5W extraction 5W question answering 


  1. 1.
    Agence France-Presse: Taliban attacks German consulate in Northern Afghan city of Mazar-i-Sharif with truck bomb. The Telegraph (2016)Google Scholar
  2. 2.
    Allan, J., et al.: 1998 Topic detection and tracking pilot study: final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop, pp. 194–218 (1998)Google Scholar
  3. 3.
    Altenberg, B.: Causal linking in spoken and written English. Studia Linguistica 38(1), 20–69 (1984)CrossRefGoogle Scholar
  4. 4.
    Asghar, N.: Automatic extraction of causal relations from natural language texts: a comprehensive survey. arXiv preprint arXiv:1605.07895 (2016)
  5. 5.
    Best, C., et al.: Europe media monitor (2005)Google Scholar
  6. 6.
    Bethard, S., Martin, J.H.: Learning semantic links from a corpus of parallel temporal and causal relations. In: Proceedings of the 46th Annual Meeting of the ACL on Human Language Technologies, pp. 177–180 (2008)Google Scholar
  7. 7.
    Bird, S., et al.: Natural Language Processing with Python: Analyzing Text with the Natural Language Toolkit. O’Reilly Media, Inc., Sebastopol (2009)zbMATHGoogle Scholar
  8. 8.
    Carreras, X., Màrquez, L.: Introduction to the CoNLL-2005 shared task: semantic role labeling. In: Proceedings of the Ninth Conference on Computational Natural Language, pp. 152–164 (2005)Google Scholar
  9. 9.
    Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In: Proceedings of the 43rd Annual Meeting on ACL, pp. 173–180 (2005)Google Scholar
  10. 10.
    Christian, D., et al.: The Associated Press Stylebook and Briefing on Media Law. Associated Press, New York (2014)Google Scholar
  11. 11.
    Das, A., Bandyaopadhyay, S., Gambäck, B.: The 5W structure for sentiment summarization-visualization-tracking. In: Gelbukh, A. (ed.) CICLing 2012. LNCS, vol. 7181, pp. 540–555. Springer, Heidelberg (2012). CrossRefGoogle Scholar
  12. 12.
    Finkel, J.R., et al.: Incorporating non-local information into information extraction systems by gibbs sampling. In: Proceedings of the 43rd Annual Meeting on ACL, pp. 363–370 (2005)Google Scholar
  13. 13.
    Girju, R.: Automatic detection of causal relations for question answering. In: Proceedings of the ACL 2003 Workshop on Multilingual Summarization and Question Answering, vol. 12, pp. 76–83 (2003)Google Scholar
  14. 14.
    Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 377–384 (2006)Google Scholar
  15. 15.
    Hamborg, F., et al.: Identification and analysis of media bias in news articles. In: Proceedings of the 15th International Symposium of Information Science (2017)Google Scholar
  16. 16.
    Hamborg, F., et al.: Matrix-based news aggregation: exploring different news perspectives. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, p. 10 (2017)Google Scholar
  17. 17.
    Hamborg, F., et al.: news-please: A generic news crawler and extractor. In: Proceedings of the 15th International Symposium of Information Science, pp. 218–223 (2017)Google Scholar
  18. 18.
    Hripcsak, G., Rothschild, A.S.: Agreement, the F-measure, and reliability in information retrieval. J. Am. Med. Inform. Assoc. 12(3), 296–298 (2005)CrossRefGoogle Scholar
  19. 19.
    Jurafsky, D.: Speech and Language Processing. Pearson Education India, New Delhi (2000)Google Scholar
  20. 20.
    Kekäläinen, J., Järvelin, K.: Using graded relevance assessments in IR evaluation. J. Am. Soc. Inform. Sci. Technol. 53(13), 1120–1129 (2002)CrossRefGoogle Scholar
  21. 21.
    Khoo, C.S.G., et al.: Automatic extraction of cause-effect information from newspaper text without knowledge-based inferencing. Lit. Linguist. Comput. 13(4), 177–186 (1998)CrossRefGoogle Scholar
  22. 22.
    Khoo, C.S.G.: Automatic identification of causal relations in text and their use for improving precision in information retrieval (1995)Google Scholar
  23. 23.
    Landis, J.R., Koch, G.G.: The measurement of observer agreement for categorical data. Biometrics 33(1), 159–174 (1977)CrossRefzbMATHGoogle Scholar
  24. 24.
    Manning, C.D., et al.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)zbMATHGoogle Scholar
  25. 25.
    McKeown, K.R., et al.: Tracking and summarizing news on a daily basis with Columbia’s Newsblaster. In: Proceedings of the 2nd International Conference on Human Language Technology Research, pp. 280–285 (2002)Google Scholar
  26. 26.
    Oliver, P.E., Maney, G.M.: Political processes and local newspaper coverage of protest events: from selection bias to triadic interactions. Am. J. Sociol. 106(2), 463–505 (2000)CrossRefGoogle Scholar
  27. 27.
    Park, S., et al. NewsCube: delivering multiple aspects of news to mitigate media bias. In: Proceedings of SIGCHI 2009 Conference on Human Factors in Computing Systems, pp. 443–453 (2009)Google Scholar
  28. 28.
    parsedatetime - Parse human-readable date/time strings. Accessed 21 Aug 2017
  29. 29.
    Parton, K., et al.: Who, what, when, where, why?: comparing multiple approaches to the cross-lingual 5W task. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 1, pp. 423–431 (2009)Google Scholar
  30. 30.
    Sharma, S., et al.: News event extraction using 5W1H approach & its analysis. Int. J. Sci. Eng. Res. – IJSER 4(5), 2064–2067 (2013)Google Scholar
  31. 31.
    Stemler, S.: An overview of content analysis. Pract. Assess. Res. Eval. 7(17), 137–146 (2001)Google Scholar
  32. 32.
    Tanev, H., Piskorski, J., Atkinson, M.: Real-time news event extraction for global crisis monitoring. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds.) NLDB 2008. LNCS, vol. 5039, pp. 207–218. Springer, Heidelberg (2008). CrossRefGoogle Scholar
  33. 33.
    Taylor, A., et al.: The Penn treebank: an overview. In: Abeillé, A. (ed.) Treebanks. TLTB, vol. 20, pp. 5–22. Springer, Dordrecht (2003). CrossRefGoogle Scholar
  34. 34.
    Wang, W., et al.: Chinese news event 5W1H elements extraction using semantic role labeling. In: 2010 Third International Symposium on Information Processing (ISIP), pp. 484–489 (2010)Google Scholar
  35. 35.
    Yaman, S., et al.: Classification-based strategies for combining multiple 5-W question answering systems. In: INTERSPEECH, pp. 2703–2706 (2009)Google Scholar
  36. 36.
    Yaman, S., et al.: Combining semantic and syntactic information sources for 5-W question answering. In: INTERSPEECH, pp. 2707–2710 (2009)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.University of KonstanzKonstanzGermany

Personalised recommendations