Advertisement

Named Entity Recognition and Linking in Tweets Based on Linguistic Similarity

  • Arianna Pipitone
  • Giuseppe Tirone
  • Roberto PirroneEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10640)

Abstract

This work proposes a novel approach in Named Entity rEcognition and Linking (NEEL) in tweets, applying the same strategy already presented for Question Answering (QA) by the same authors. The previous work describes a rule-based and ontology-based system that attempts to retrieve the correct answer to a query from the DBPedia ontology through a similarity measure between the query and the ontology labels. In this paper, a tweet is interpreted as a query for the QA system: both the text and the thread of a tweet are a sequence of statements that have been linked to the ontology. Provided that tweets make extensive use of informal language, the similarity measure and the underlying processes have been devised differently than in the previous approach; also the particular structure of a tweet, that is the presence of mentions, hashtags, and partially structured statements, is taken into consideration for linguistic insights. NEEL is achieved actually as the output of annotating a tweet with the names of the ontological entities retrieved by the system. The strategy is explained in detail along with the architecture and the implementation of the system; also the performance as compared to the systems presented at the #Micropost2016 workshop NEEL Challenge co-located with the World Wide Web conference 2016 (WWW ’16) is reported and discussed.

References

  1. 1.
    Beaufort, R., Roekhaut, S., Cougnon, L.A., Fairon, C.: A hybrid rule/model-based finite-state framework for normalizing SMS messages. In: Hajic, J., Carberry, S., Clark, S. (eds.) ACL, pp. 770–779. The Association for Computer Linguistics (2010). http://dblp.uni-trier.de/db/conf/acl/acl2010.html#BeaufortRCF10
  2. 2.
    Derczynski, L., Maynard, D., Rizzo, G., van Erp, M., Gorrell, G., Troncy, R., Petrak, J., Bontcheva, K.: Analysis of named entity recognition and linking for tweets. Inf. Process. Manag. 51(2), 32–49 (2015)CrossRefGoogle Scholar
  3. 3.
    Fellbaum, C. (ed.): WordNet An Electronic Lexical Database. The MIT Press, Cambridge; London (1998)Google Scholar
  4. 4.
    Habib, M.B., van Keulen, M.: Need4tweet: a twitterbot for tweets named entity extraction and disambiguation. In: Proceedings of the System Demonstrations of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015), Beijing, China. The Association for Computer Linguistics, Beijing, July 2015Google Scholar
  5. 5.
    Habib, M., van Keulen, M.: A generic open world named entity disambiguation approach for tweets. In: 5th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2013. SciTePress, September 2013. http://doc.utwente.nl/86471/
  6. 6.
    Han, B., Baldwin, T.: Lexical normalisation of short text messages: makn sens a #twitter. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, HLT 2011, vol. 1, pp. 368–378. Association for Computational Linguistics, Stroudsburg (2011). http://dl.acm.org/citation.cfm?id=2002472.2002520
  7. 7.
    Hoover, W.A., Gough, P.B.: The simple view of reading. Read. Writ. 2(2), 127–160 (1990).  https://doi.org/10.1007/BF00401799 CrossRefGoogle Scholar
  8. 8.
    Kaufmann, M., Kalita, J.: Syntactic normalization of Twitter messages. In: International Conference on Natural Language Processing, Kharagpur, India (2010)Google Scholar
  9. 9.
    Kobus, C., Yvon, F., Damnati, G.: Normalizing SMS: are two metaphors better than one? In: Proceedings of the 22nd International Conference on Computational Linguistics, COLING 2008, vol. 1, pp. 441–448. Association for Computational Linguistics, Stroudsburg (2008). http://dl.acm.org/citation.cfm?id=1599081.1599137
  10. 10.
    Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Lee, B.S.: Twiner: named entity recognition in targeted Twitter stream. In: Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2012, pp. 721–730. ACM, New York (2012). http://doi.acm.org/10.1145/2348283.2348380
  11. 11.
    Liu, F., Weng, F., Wang, B., Liu, Y.: Insertion, deletion, or substitution?: normalizing text messages without pre-categorization nor supervision (2011)Google Scholar
  12. 12.
    Nothman, J., Ringland, N., Radford, W., Murphy, T., Curran, J.R.: Learning multilingual named entity recognition from Wikipedia. Artif. Intell. 194, 151–175 (2013).  https://doi.org/10.1016/j.artint.2012.03.006 CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    Pipitone, A., Campisi, M.C., Pirrone, R.: An A* based semantic tokenizer for increasing the performance of semantic applications. In: 2013 IEEE Seventh International Conference on Semantic Computing, Irvine, CA, USA, 16–18 September 2013, pp. 393–394. IEEE Computer Society (2013). https://doi.org/10.1109/ICSC.2013.75
  14. 14.
    Pipitone, A., Tirone, G., Pirrone, R.: QuASIt: a cognitive inspired approach to question answering for the Italian language. In: Adorni, G., Cagnoni, S., Gori, M., Maratea, M. (eds.) AI*IA 2016. LNCS, vol. 10037, pp. 464–476. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-49130-1_34 CrossRefGoogle Scholar
  15. 15.
    Plu, J., Rizzo, G., Troncy, R.: Enhancing entity linking by combining NER models. In: Sack, H., Dietze, S., Tordai, A., Lange, C. (eds.) SemWebEval 2016. CCIS, vol. 641, pp. 17–32. Springer, Cham (2016).  https://doi.org/10.1007/978-3-319-46565-4_2 CrossRefGoogle Scholar
  16. 16.
    Ritter, A., Clark, S., Mausam, Etzioni, O.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, pp. 1524–1534. Association for Computational Linguistics, Stroudsburg (2011). http://dl.acm.org/citation.cfm?id=2145432.2145595
  17. 17.
    Rizzo, G., van Erp, M., Plu, J., Troncy, R.: Making sense of microposts (#microposts2016) named entity recognition and linking (NEEL) challenge. In: Dadzie, A., Preotiuc-Pietro, D., Radovanovic, D., Basave, A.E.C., Weller, K. (eds.) Proceedings of the 6th Workshop on ‘Making Sense of Microposts’ co-located with the 25th International World Wide Web Conference (WWW 2016), Montréal, Canada, 11 April 2016. CEUR Workshop Proceedings, vol. 1691, pp. 50–59. CEUR-WS.org (2016). http://ceur-ws.org/Vol-1691/microposts2016_neel-challenge-report/
  18. 18.
    Rupley, W.H., Blair, T.R., Nichols, W.D.: Effective reading instruction for struggling readers: the role of direct/explicit teaching. Read. Writ. Q. 25(2–3), 125–138 (2009).  https://doi.org/10.1080/10573560802683523 CrossRefGoogle Scholar
  19. 19.
    Wang, A., Chen, T., Kan, M.Y.: Re-tweeting from a linguistic perspective. In: Proceedings of the Second Workshop on Language in Social Media, LSM 2012, pp. 46–55. Association for Computational Linguistics, Stroudsburg (2012). http://dl.acm.org/citation.cfm?id=2390374.2390380

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Arianna Pipitone
    • 1
  • Giuseppe Tirone
    • 1
  • Roberto Pirrone
    • 1
    Email author
  1. 1.Dipartimento dell’Innovazione Industriale e Digitale (DIID)Università degli Studi di PalermoPalermoItaly

Personalised recommendations