Skip to main content

Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8816))

Abstract

Social media content represents a large portion of all textual content appearing on the Internet. These streams of user generated content (UGC) provide an opportunity and challenge for media analysts to analyze huge amount of new data and use them to infer and reason with new information. A main challenge of natural language is its ambiguity and vagueness. To automatically resolve ambiguity, the grammatical structure of sentences is used. However, when we move to informal language widely used in social media, the language becomes more ambiguous and thus more challenging for automatic understanding.

Information Extraction (IE) is the research field that enables the use of unstructured text in a structured way. Named Entity Extraction (NEE) is a sub task of IE that aims to locate phrases (mentions) in the text that represent names of entities such as persons, organizations or locations regardless of their type. Named Entity Disambiguation (NED) is the task of determining which correct person, place, event, etc. is referred to by a mention.

The goal of this paper is to provide an overview on some approaches that mimic the human way of recognition and disambiguation of named entities especially for domains that lack formal sentence structure. The proposed methods open the doors for more sophisticated applications based on users’ contributions on social media. We propose a robust combined framework for NEE and NED in semi-formal and informal text. The achieved robustness has been proven to be valid across languages and domains and to be independent of the selected extraction and disambiguation techniques. It is also shown to be robust against the informality of the used language. We have discovered a reinforcement effect and exploited it a technique that improves extraction quality by feeding back disambiguation results. We present a method of handling the uncertainty involved in extraction to improve the disambiguation results.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://blog.twitter.com/2012/twitter-turns-six

  2. 2.

    http://en.wikipedia.org/wiki/Project_X_Haren

  3. 3.

    http://www.tec4se.nl/

  4. 4.

    Some NER datasets consider nationalities as NEs [7].

  5. 5.

    http://nlp.stanford.edu:8080/ner/process

  6. 6.

    www.eurocottage.com

  7. 7.

    We made use of the lingpipe toolkit for development: http://alias-i.com/lingpipe.

  8. 8.

    http://wis.ewi.tudelft.nl/umap2011/ + TREC 2011 Microblog track collection.

References

  1. Social networking reaches nearly one in four around the world

    Google Scholar 

  2. Chinchor, N.A.: Proceedings of the Seventh Message Understanding Conference (MUC-7) named entity task definition, Fairfax, VA, 21 p., April 1998. http://www.itl.nist.gov/iaui/894.02/related_projects/muc (version 3.5.)

  3. Abbasi, M.-A., Chai, S.-K., Liu, H., Sagoo, K.: Real-world behavior analysis through a social media lens. In: Yang, S.J., Greenberg, A.M., Endsley, M. (eds.) SBP 2012. LNCS, vol. 7227, pp. 18–26. Springer, Heidelberg (2012)

    Chapter  Google Scholar 

  4. Yu, S., Kak, S.: A survey of prediction using social media. CoRR, abs/1203.1647 (2012)

    Google Scholar 

  5. Lin, T., Mausam, Etzioni, O.: Entity linking at web scale. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX), pp. 84–88 (2012)

    Google Scholar 

  6. Hoffart, J., Suchanek, F., Berberich, K., Kelham, E., de Melo, G., Weikum, G.: Yago2: Exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of WWW 2011, pp. 229–232 (2011)

    Google Scholar 

  7. Basave, A.E.C., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.-S.: Making sense of microposts (#msm2013) concept extraction challenge. In: Making Sense of Microposts (#MSM2013) Concept Extraction Challenge, pp. 1–15 (2013)

    Google Scholar 

  8. Ekbal, A., Bandyopadhyay, S.: A hidden Markov model based named entity recognition system: Bengali and Hindi as case studies. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 545–552. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  9. Wallach, H.: Conditional random fields: An introduction. Technical Report MS-CIS-04-21, Department of Computer and Information Science, University of Pennsylvania (2004)

    Google Scholar 

  10. Habib, M.B., van Keulen, M.: Named entity extraction and disambiguation: The reinforcement effect. In: Proceedings of MUD 2011, Seatle, USA, pp. 9–16 (2011)

    Google Scholar 

  11. Cano, A.E., Rizzo, G., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.-S.: #microposts2014 neel challenge: Measuring the performance of entity linking systems in social streams. In: Proceedings of the #Microposts2014 NEEL Challenge (2014)

    Google Scholar 

  12. Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Lee, B.-S.: Twiner: named entity recognition in targeted twitter stream. In: SIGIR, pp. 721–730 (2012)

    Google Scholar 

  13. Habib, M.B., van Keulen, M.: A generic open world named entity disambiguation approach for tweets. In: Proceedings of the 5th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2013, Vilamoura, Portugal, pp. 267–276, September 2013. SciTePress, Portugal (2013)

    Google Scholar 

  14. Habib, M., Van Keulen, M., Zhu, Z.: Concept extraction challenge: University of Twente at #msm2013. In: Making Sense of Microposts (#MSM2013) Concept Extraction Challenge, pp. 17–20 (2013)

    Google Scholar 

  15. Yosef, M.A., Hoffart, J., Bordino, I., Spaniol, M., Weikum, G.: Aida: An online tool for accurate disambiguation of named entities in text and tables. In: PVLDB, pp. 1450–1453 (2011)

    Google Scholar 

  16. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL, pp. 363–370 (2005)

    Google Scholar 

  17. van Keulen, M., de Keijzer, A.: Qualitative effects of knowledge rules and user feedback in probabilistic data integration. VLDB J. 18(5), 1191–1217 (2009)

    Article  Google Scholar 

  18. Huang, J., Antova, L., Koch, C., Olteanu, D.: MayBMS: A probabilistic database management system. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, Providence, Rhode Island, pp. 1071–1074 (2009)

    Google Scholar 

  19. Koch, C., Olteanu, D.: Conditioning probabilistic databases. Proc. VLDB Endow. 1(1), 313–325 (2008)

    Article  Google Scholar 

  20. Sen, P., Deshpande, A., Getoor, L.: Exploiting shared correlations in probabilistic databases. Proc. VLDB Endow. 1(1), 809–820 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mena B. Habib .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

van Keulen, M., Habib, M.B. (2014). Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text. In: Bobillo, F., et al. Uncertainty Reasoning for the Semantic Web III. URSW URSW URSW 2012 2011 2013. Lecture Notes in Computer Science(), vol 8816. Springer, Cham. https://doi.org/10.1007/978-3-319-13413-0_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-13413-0_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-13412-3

  • Online ISBN: 978-3-319-13413-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics