Domain-General Versus Domain-Specific Named Entity Recognition: A Case Study Using TEXT

Lim, Cheng Yang; Tan, Ian K. T.; Selvaretnam, Bhawani

doi:10.1007/978-3-030-33709-4_21

Domain-General Versus Domain-Specific Named Entity Recognition: A Case Study Using TEXT

Conference paper
First Online: 21 October 2019

706 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11909))

Abstract

Named entity recognition (NER) seeks to identify and classify named entities within bodies of text into language categories such as nouns, that are reflective of locations, organizations, and people. As it is language dependent, the approach taken for most NER systems are domain-general, meaning that they are designed based on a language and not on a specific targeted domain. With current usage of non-formal languages on social media, this instigates the need to compare the performance of domain-general and domain specific NERs. A domain specific NER (vehicle traffic domain), TEXT, is described and the performance of domain-general NER versus TEXT is compared. The results of the evaluation show that the performance of domain-specific NER significantly outperforms domain-general NER. The domain-general NER could only perform adequately for common scenarios.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Al-Rfou, R., Kulkarni, V., Perozzi, B., Skiena, S.: POLYGLOT-NER: massive multilingual named entity recognition. In: Proceedings of the 2015 SIAM International Conference on Data Mining, pp. 586–594. SIAM (2015)
Google Scholar
Bird, S., Loper, E.: NLTK: the natural language toolkit. association for computational linguistics. In: Proceedings of the ACL Demonstration Session, pp. 214–217 (2004)
Google Scholar
Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively created graph database for structuring human knowledge. In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, pp. 1247–1250. AcM (2008)
Google Scholar
Chiticariu, L., Li, Y., Reiss, F.R.: Rule-based information extraction is dead! long live rule-based information extraction systems! In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 827–832 (2013)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pp. 363–370. Association for Computational Linguistics (2005)
Google Scholar
King, D.E.: Dlib-ml: a machine learning toolkit. J. Mach. Learn. Res. 10, 1755–1758 (2009)
Google Scholar
Koehn, P., et al.: Moses: open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics Companion Volume Proceedings of the Demo and Poster Sessions, pp. 177–180 (2007)
Google Scholar
Lim, C.Y., Tan, I.K., Selvaretnam, B., Howg, E.K., Kar, L.H.: Text: Traffic entity extraction from Twitter. In: Proceedings of the 2019 5th International Conference on Computing and Data Engineering, pp. 53–59. ACM (2019)
Google Scholar
Manning, C., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S., McClosky, D.: The stanford coreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)
Google Scholar
Ritter, A., Clark, S., Etzioni, O., et al.: Named entity recognition in tweets: an experimental study. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. Association for Computational Linguistics (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computing and Informatics, Multimedia University, 63100, Cyberjaya, Selangor, Malaysia
Cheng Yang Lim
School of IT, Monash University Malaysia, Bandar Sunway, 47500, Subang Jaya, Selangor, Malaysia
Ian K. T. Tan
Valiantlytix Sdn Bhd, Pinnacle Petaling Jaya, Jalan 51a/223, PJS 52, 46100, Petaling Jaya, Selangor, Malaysia
Bhawani Selvaretnam

Authors

Cheng Yang Lim
View author publications
You can also search for this author in PubMed Google Scholar
Ian K. T. Tan
View author publications
You can also search for this author in PubMed Google Scholar
Bhawani Selvaretnam
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ian K. T. Tan .

Editor information

Editors and Affiliations

Mahasarakham University, Maha Sarakham, Thailand
Rapeeporn Chamchong
Murdoch University, Murdoch, WA, Australia
Kok Wai Wong

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lim, C.Y., Tan, I.K.T., Selvaretnam, B. (2019). Domain-General Versus Domain-Specific Named Entity Recognition: A Case Study Using TEXT. In: Chamchong, R., Wong, K. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2019. Lecture Notes in Computer Science(), vol 11909. Springer, Cham. https://doi.org/10.1007/978-3-030-33709-4_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-33709-4_21
Published: 21 October 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-33708-7
Online ISBN: 978-3-030-33709-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics