Self-attentive Model for Headline Generation

  • Daniil Gavrilov
  • Pavel KalaidinEmail author
  • Valentin Malykh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11438)


Headline generation is a special type of text summarization task. While the amount of available training data for this task is almost unlimited, it still remains challenging, as learning to generate headlines for news articles implies that the model has strong reasoning about natural language. To overcome this issue, we applied recent Universal Transformer architecture paired with byte-pair encoding technique and achieved new state-of-the-art results on the New York Times Annotated corpus with ROUGE-L F1-score 24.84 and ROUGE-2 F1-score 13.48. We also present the new RIA corpus and reach ROUGE-L F1-score 36.81 and ROUGE-2 F1-score 22.15 on it.


Universal Transformer Headline generation BPE Summarization 



Authors are thankful to Alexey Samarin for useful discussions, David Prince for proofreading, Madina Kabirova for proofreading and human evaluation organization, Anastasia Semenyuk and Maria Zaharova for help obtaining the New York Times Annotated corpus, and Alexey Filippovskii for providing the Rossiya Segodnya corpus.


  1. 1.
    Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473 (2014)
  2. 2.
    Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994). Scholar
  3. 3.
    Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J., Kaiser, Ł.: Universal transformers. arXiv preprint arXiv:1807.03819 (2018)
  4. 4.
    Hayashi, Y., Yanagimoto, H.: Headline generation with recurrent neural network. In: Matsuo, T., Mine, T., Hirokawa, S. (eds.) New Trends in E-service and Smart Computing. SCI, vol. 742, pp. 81–96. Springer, Cham (2018). Scholar
  5. 5.
    Howard, J., Ruder, S.: Fine-tuned language models for text classification. arXiv preprint arXiv:1801.06146 (2018)
  6. 6.
    Joulin, A., Grave, E., Bojanowski, P., Mikolov, T.: Bag of tricks for efficient text classification. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Short Papers, vol. 2, pp. 427–431 (2017)Google Scholar
  7. 7.
    Kim, B., Kim, H., Kim, G.: Abstractive summarization of Reddit posts with multi-level memory networks. arXiv (2018)Google Scholar
  8. 8.
    Ba, J.L., Kiros, J.R., Hinton, G.: Layer normalization (2016)Google Scholar
  9. 9.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In Proceedings of NIPS (2013)Google Scholar
  10. 10.
    Putra, J.W.G., Kobayashi, H., Shimizu, N.: Experiment on using topic sentence for neural news headline generation (2018)Google Scholar
  11. 11.
    Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. In: Empirical Methods in Natural Language Processing, pp. 379–389 (2015)Google Scholar
  12. 12.
    Sandhaus, E.: The New York Times annotated corpus LDC2008T19. DVD. Linguistic Data Consortium, Philadelphia (2008)Google Scholar
  13. 13.
    Sennrich, R., Haddow, B., Birch, A.: Neural machine translation of rare words with subword units (2015).
  14. 14.
    Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks (2014).
  15. 15.
    Tan, J., Wan, X., Xiao, J.: From neural sentence summarization to headline generation: a coarse-to-fine approach. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence, pp. 4109–4115. AAAI Press (2017)Google Scholar
  16. 16.
    Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Daniil Gavrilov
    • 1
  • Pavel Kalaidin
    • 1
    Email author
  • Valentin Malykh
    • 1
  1. 1.VKSaint-PetersburgRussia

Personalised recommendations