The Complementary Nature of Different NLP Toolkits for Named Entity Recognition in Social Media

  • Filipe BatistaEmail author
  • Álvaro Figueira
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10423)


In this paper we study the combined use of four different NLP toolkits—Stanford CoreNLP, GATE, OpenNLP and Twitter NLP tools—in the context of social media posts. Previous studies have shown performance comparisons between these tools, both on news and social media corporas. In this paper, we go further by trying to understand how differently these toolkits predict Named Entities, in terms of their precision and recall for three different entity types, and how they can complement each other in this task in order to achieve a combined performance superior to each individual one. Experiments on two publicly available datasets from the workshops WNUT-2015 and #MSM2013 show that using an ensemble of toolkits can improve the recognition of specific entity types - up to 10.62% for the entity type Person, 1.97% for the type Location and 1.31% for the type Organization, depending on the dataset and the criteria used for the voting. Our results also showed improvements of 3.76% and 1.69%, in each dataset respectively, on the average performance of the three entity types.


Named Entity Recognition Social medria Ensemble of NLP toolkits Text-mining Machine learning 



This work is supported by the ERDF European Regional Development Fund through the COMPETE Programme (operational programme for competitiveness) and by National Funds through the FCT (Portuguese Foundation for Science and Technology) within project “Reminds/UTAP-ICDT/EEI-CTP/0022/2014”.


  1. 1.
    Atdağ, S., Labatut, V.: A comparison of named entity recognition tools applied to biographical texts. In: 2013 2nd International Conference on Systems and Computer Science (ICSCS), pp. 228–233. IEEE (2013)Google Scholar
  2. 2.
    Baldwin, T., De Marneffe, M.C., Han, B., Kim, Y.-B., Ritter, A., Xu, W.: Shared tasks of the: Twitter lexical normalization and named entity recognition. In: Proceedings of the Workshop on Noisy User-generated Text (WNUT 2015), Beijing, China (2015)Google Scholar
  3. 3.
    Bontcheva, K., Derczynski, L., Funk, A., Greenwood, M.A., Maynard, D., Aswani, N.: Twitie: an open-source information extraction pipeline for microblog text. In: RANLP, pp. 83–90 (2013)Google Scholar
  4. 4.
    Cano Basave, A.E., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.-S.: Making sense of microposts (# msm2013) concept extraction challenge (2013)Google Scholar
  5. 5.
    Clark, A., Fox, C., Lappin, S.: The Handbook of Computational Linguistics and Natural Language Processing. Wiley, Hoboken (2013)Google Scholar
  6. 6.
    Figueira, A., Sandim, M., Fortuna, P.: An approach to relevancy detection: contributions to the automatic detection of relevance in social networks. In: Rocha, A., Correia, A.M., Adeli, H., Reis, L.P., Teixeira, M.M. (eds.) ITEM 2014. AISC, vol. 444, pp. 89–99. Springer, Cham (2016). doi: 10.1007/978-3-319-31232-3_9CrossRefGoogle Scholar
  7. 7. - wiki/twitie.html. Accessed 06 Oct 2017
  8. 8.
    Jiang, R., Banchs, R.E., Li, H.: Evaluating and combining named entity recognition systems. In: ACL 2016, p. 21 (2016)Google Scholar
  9. 9.
    Laboreiro, G., Sarmento, L., Teixeira, J., Oliveira, E.: Tokenizing micro-blogging messages using a text classification approach. In: Proceedings of the Fourth Workshop on Analytics for Noisy Unstructured Text Data, pp. 81–88. ACM (2010)Google Scholar
  10. 10.
    C. D. Manning, M. Surdeanu, J. Bauer, J. R. Finkel, S. Bethard, and D. McClosky. The stanford corenlp natural language processing toolkit. In ACL (System Demonstrations), pp. 55–60 (2014)Google Scholar
  11. 11.
    Nebhi, K., Bontcheva, K., Gorrell, G.: Restoring capitalization in# tweets. In: Proceedings of the 24th International Conference on World Wide Web, pp. 1111–1115. ACM (2015)Google Scholar
  12. 12.
    Apache opennlp. Accessed 06 Oct 2017
  13. 13.
    Pinto, A., Gonçalo Oliveira, H., Oliveira Alves, A.: Comparing the performance of different nlp toolkits in formal and social media text. In: OASIcs-OpenAccess Series in Informatics, vol. 51. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik (2016)Google Scholar
  14. 14.
    Ramshaw, L.A., Marcus, M.P.: Text chunking using transformation-based learning. In: Armstrong, S., Church, K., Isabelle, P., Manzi, S., Tzoukermann, E., Yarowsky, D. (eds.) Natural Language Processing Using Very Large Corpora, vol. 11, pp. 157–176. Springer, Heidelberg (1999). doi: 10.1007/978-94-017-2390-9_10CrossRefGoogle Scholar
  15. 15.
    Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Association for Computational Linguistics (2011)Google Scholar
  16. 16.
    Rodriquez, K.J., Bryant, M., Blanke, T., Luszczynska, M.: Comparison of named entity recognition tools for raw OCR text. In: KONVENS, pp. 410–414 (2012)Google Scholar
  17. 17.
    Saha, S., Ekbal, A.: Combining multiple classifiers using vote based classifier ensemble technique for named entity recognition. Data Knowl. Eng. 85, 15–39 (2013)CrossRefGoogle Scholar
  18. 18.
    Wu, C.-W., Jan, S.-Y., Tsai, R.T.-H., Hsu, W.-L.: On using ensemble methods for Chinese named entity recognition. In: Proceedings of the 5th SIGHAN Workshop on Chinese Language Processing, pp. 142–145 (2006)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.CRACS/INESC TEC and University of PortoPortoPortugal

Personalised recommendations