Bringing Back Structure to Free Text Email Conversations with Recurrent Neural Networks

  • Tim RepkeEmail author
  • Ralf Krestel
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10772)


Email communication plays an integral part of everybody’s life nowadays. Especially for business emails, extracting and analysing these communication networks can reveal interesting patterns of processes and decision making within a company. Fraud detection is another application area where precise detection of communication networks is essential. In this paper we present an approach based on recurrent neural networks to untangle email threads originating from forward and reply behaviour. We further classify parts of emails into 2 or 5 zones to capture not only header and body information but also greetings and signatures. We show that our deep learning approach outperforms state-of-the-art systems based on traditional machine learning and hand-crafted rules. Besides using the well-known Enron email corpus for our experiments, we additionally created a new annotated email benchmark corpus from Apache mailing lists.


  1. 1.
    Bonchi, F., Castillo, C., Gionis, A., Jaimes, A.: Social network analysis and mining for business applications. TIST 2(3), 22 (2011)CrossRefGoogle Scholar
  2. 2.
    Carvalho, V., Cohen, W.: Learning to extract signature and reply lines from email. In: CEAS (2004)Google Scholar
  3. 3.
    Cho, K., van Merrienboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. CoRR (2014)Google Scholar
  4. 4.
    Estival, D., Gaustad, T., Pham, S., Radford, W., Hutchinson, B.: Author profiling for English emails. In: Conference of the Pacific ACL (2007)Google Scholar
  5. 5.
    Huang, Z., Xu, W., Yu, K.: Bidirectional LSTM-CRF models for sequence tagging. CoRR (2015)Google Scholar
  6. 6.
    Jamison, E., Gurevych, I.: Headerless, quoteless, but not hopeless? Using pairwise email classification to disentangle email threads. In: RANLP (2013)Google Scholar
  7. 7.
    Joty, S., Carenini, G., Ng, R.T.: Topic segmentation and labeling in asynchronous conversations. Artif. Intell. Res. 47, 521–573 (2013)MathSciNetzbMATHGoogle Scholar
  8. 8.
    Kim, Y., Jernite, Y., Sontag, D., Rush, A.: Character-aware neural language models. CoRR (2015)Google Scholar
  9. 9.
    Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004). CrossRefGoogle Scholar
  10. 10.
    Lampert, A., Dale, R., Paris, C.: Segmenting email message text into zones. In: EMNLP (2009)Google Scholar
  11. 11.
    Lampert, A., Dale, R., Paris, C.: Detecting emails containing requests for action. In: Human Language Technologies. ACL (2010)Google Scholar
  12. 12.
    Lang, K.: Newsweeder: learning to filter netnews. In: Twelfth International Conference on Machine Learning (1995)Google Scholar
  13. 13.
    Ma, X., Hovy, E.H.: End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF. CoRR (2016)Google Scholar
  14. 14.
    Mujtaba, G., Shuib, L., Raj, R., Majeed, N., Al-Garadi, M.: Email classification research trends: review and open issues. IEEE Access 5, 9044–9064 (2017)CrossRefGoogle Scholar
  15. 15.
    Oard, D., Webber, W., Kirsch, D., Golitsynskiy, S.: Avocado Research Email Collection. Linguistic Data Consortium, Philadelphia (2015)Google Scholar
  16. 16.
    Perer, A., Shneiderman, B.: Beyond threads: identifying discussions in email archives. Technical report, MUC (2005)Google Scholar
  17. 17.
    Rauscher, F., Matta, N., Atifi, H.: Context aware knowledge zoning: traceability and business emails. In: Mercier-Laurent, E., Boulanger, D. (eds.) AI4KM 2015. IAICT, vol. 497, pp. 66–79. Springer, Cham (2016). CrossRefGoogle Scholar
  18. 18.
    Scerri, S., Gossen, G., Davis, B., Handschuh, S.: Classifying action items for semantic email. In: LREC (2010)Google Scholar
  19. 19.
    Nguyen, D.T., Joty, S., Boussaha, B.E.A., de Rijke, M.: Thread reconstruction in conversational data using neural coherence models. In: Neu-IR (2017)Google Scholar
  20. 20.
    Wang, Y.C., Joshi, M., Cohen, W.W., Rosé, C.P.: Recovering implicit thread structure in newsgroup style conversations. In: ICWSM (2008)Google Scholar
  21. 21.
    Yang, L., Dumais, S., Bennett, P., Awadallah, A.: Characterizing and predicting enterprise email reply behavior. In: SIGIR (2017)Google Scholar
  22. 22.
    Yeh, J., Hamly, A.: Thread reassembly using similary matching. In: CEAS (2006)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Hasso Plattner InstitutePotsdamGermany

Personalised recommendations