Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text

van Keulen, Maurice; Habib, Mena B.

doi:10.1007/978-3-319-13413-0_16

Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text

Maurice van Keulen¹⁴ &
Mena B. Habib¹⁴

Conference paper
First Online: 30 November 2014

473 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8816))

Abstract

Social media content represents a large portion of all textual content appearing on the Internet. These streams of user generated content (UGC) provide an opportunity and challenge for media analysts to analyze huge amount of new data and use them to infer and reason with new information. A main challenge of natural language is its ambiguity and vagueness. To automatically resolve ambiguity, the grammatical structure of sentences is used. However, when we move to informal language widely used in social media, the language becomes more ambiguous and thus more challenging for automatic understanding.

Information Extraction (IE) is the research field that enables the use of unstructured text in a structured way. Named Entity Extraction (NEE) is a sub task of IE that aims to locate phrases (mentions) in the text that represent names of entities such as persons, organizations or locations regardless of their type. Named Entity Disambiguation (NED) is the task of determining which correct person, place, event, etc. is referred to by a mention.

The goal of this paper is to provide an overview on some approaches that mimic the human way of recognition and disambiguation of named entities especially for domains that lack formal sentence structure. The proposed methods open the doors for more sophisticated applications based on users’ contributions on social media. We propose a robust combined framework for NEE and NED in semi-formal and informal text. The achieved robustness has been proven to be valid across languages and domains and to be independent of the selected extraction and disambiguation techniques. It is also shown to be robust against the informality of the used language. We have discovered a reinforcement effect and exploited it a technique that improves extraction quality by feeding back disambiguation results. We present a method of handling the uncertainty involved in extraction to improve the disambiguation results.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
https://blog.twitter.com/2012/twitter-turns-six
2.
http://en.wikipedia.org/wiki/Project_X_Haren
3.
http://www.tec4se.nl/
4.
Some NER datasets consider nationalities as NEs [7].
5.
http://nlp.stanford.edu:8080/ner/process
6.
www.eurocottage.com
7.
We made use of the lingpipe toolkit for development: http://alias-i.com/lingpipe.
8.
http://wis.ewi.tudelft.nl/umap2011/ + TREC 2011 Microblog track collection.

References

Social networking reaches nearly one in four around the world
Google Scholar
Chinchor, N.A.: Proceedings of the Seventh Message Understanding Conference (MUC-7) named entity task definition, Fairfax, VA, 21 p., April 1998. http://www.itl.nist.gov/iaui/894.02/related_projects/muc (version 3.5.)
Abbasi, M.-A., Chai, S.-K., Liu, H., Sagoo, K.: Real-world behavior analysis through a social media lens. In: Yang, S.J., Greenberg, A.M., Endsley, M. (eds.) SBP 2012. LNCS, vol. 7227, pp. 18–26. Springer, Heidelberg (2012)
Chapter Google Scholar
Yu, S., Kak, S.: A survey of prediction using social media. CoRR, abs/1203.1647 (2012)
Google Scholar
Lin, T., Mausam, Etzioni, O.: Entity linking at web scale. In: Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-scale Knowledge Extraction (AKBC-WEKEX), pp. 84–88 (2012)
Google Scholar
Hoffart, J., Suchanek, F., Berberich, K., Kelham, E., de Melo, G., Weikum, G.: Yago2: Exploring and querying world knowledge in time, space, context, and many languages. In: Proceedings of WWW 2011, pp. 229–232 (2011)
Google Scholar
Basave, A.E.C., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.-S.: Making sense of microposts (#msm2013) concept extraction challenge. In: Making Sense of Microposts (#MSM2013) Concept Extraction Challenge, pp. 1–15 (2013)
Google Scholar
Ekbal, A., Bandyopadhyay, S.: A hidden Markov model based named entity recognition system: Bengali and Hindi as case studies. In: Ghosh, A., De, R.K., Pal, S.K. (eds.) PReMI 2007. LNCS, vol. 4815, pp. 545–552. Springer, Heidelberg (2007)
Chapter Google Scholar
Wallach, H.: Conditional random fields: An introduction. Technical Report MS-CIS-04-21, Department of Computer and Information Science, University of Pennsylvania (2004)
Google Scholar
Habib, M.B., van Keulen, M.: Named entity extraction and disambiguation: The reinforcement effect. In: Proceedings of MUD 2011, Seatle, USA, pp. 9–16 (2011)
Google Scholar
Cano, A.E., Rizzo, G., Varga, A., Rowe, M., Stankovic, M., Dadzie, A.-S.: #microposts2014 neel challenge: Measuring the performance of entity linking systems in social streams. In: Proceedings of the #Microposts2014 NEEL Challenge (2014)
Google Scholar
Li, C., Weng, J., He, Q., Yao, Y., Datta, A., Sun, A., Lee, B.-S.: Twiner: named entity recognition in targeted twitter stream. In: SIGIR, pp. 721–730 (2012)
Google Scholar
Habib, M.B., van Keulen, M.: A generic open world named entity disambiguation approach for tweets. In: Proceedings of the 5th International Conference on Knowledge Discovery and Information Retrieval, KDIR 2013, Vilamoura, Portugal, pp. 267–276, September 2013. SciTePress, Portugal (2013)
Google Scholar
Habib, M., Van Keulen, M., Zhu, Z.: Concept extraction challenge: University of Twente at #msm2013. In: Making Sense of Microposts (#MSM2013) Concept Extraction Challenge, pp. 17–20 (2013)
Google Scholar
Yosef, M.A., Hoffart, J., Bordino, I., Spaniol, M., Weikum, G.: Aida: An online tool for accurate disambiguation of named entities in text and tables. In: PVLDB, pp. 1450–1453 (2011)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by gibbs sampling. In: ACL, pp. 363–370 (2005)
Google Scholar
van Keulen, M., de Keijzer, A.: Qualitative effects of knowledge rules and user feedback in probabilistic data integration. VLDB J. 18(5), 1191–1217 (2009)
Article Google Scholar
Huang, J., Antova, L., Koch, C., Olteanu, D.: MayBMS: A probabilistic database management system. In: Proceedings of the 35th SIGMOD International Conference on Management of Data, Providence, Rhode Island, pp. 1071–1074 (2009)
Google Scholar
Koch, C., Olteanu, D.: Conditioning probabilistic databases. Proc. VLDB Endow. 1(1), 313–325 (2008)
Article Google Scholar
Sen, P., Deshpande, A., Getoor, L.: Exploiting shared correlations in probabilistic databases. Proc. VLDB Endow. 1(1), 809–820 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of EEMCS, University of Twente, Enschede, The Netherlands
Maurice van Keulen & Mena B. Habib

Authors

Maurice van Keulen
View author publications
You can also search for this author in PubMed Google Scholar
Mena B. Habib
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mena B. Habib .

Editor information

Editors and Affiliations

University of Zaragoza, Zaragoza, Spain
Fernando Bobillo
Universidade de Brasília, Brasília, Brazil
Rommel N. Carvalho
George Mason University, Fairfax, Virginia, USA
Paulo C.G. Costa
Università degli Studi di Bari, Bari, Italy
Claudia d'Amato
Università degli Studi di Bari, Bari, Italy
Nicola Fanizzi
George Mason University, Fairfax, Virginia, USA
Kathryn B. Laskey
MITRE Corporation, McLean, Virginia, USA
Kenneth J. Laskey
University of Oxford, Oxford, United Kingdom
Thomas Lukasiewicz
National University of Ireland, Galway, Ireland
Matthias Nickles
Goldman Sachs, Washington, District of Columbia, USA
Michael Pool

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

van Keulen, M., Habib, M.B. (2014). Uncertainty Handling in Named Entity Extraction and Disambiguation for Informal Text. In: Bobillo, F., et al. Uncertainty Reasoning for the Semantic Web III. URSW URSW URSW 2012 2011 2013. Lecture Notes in Computer Science(), vol 8816. Springer, Cham. https://doi.org/10.1007/978-3-319-13413-0_16

Download citation

DOI: https://doi.org/10.1007/978-3-319-13413-0_16
Published: 30 November 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13412-3
Online ISBN: 978-3-319-13413-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics