Pattern Mining for Named Entity Recognition

Nouvel, Damien; Antoine, Jean-Yves; Friburger, Nathalie

doi:10.1007/978-3-319-08958-4_19

Damien Nouvel⁶,
Jean-Yves Antoine⁶ &
Nathalie Friburger⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Included in the following conference series:

Language and Technology Conference

873 Accesses
2 Citations

Abstract

Many evaluation campaigns have shown that knowledge-based and data-driven approaches remain equally competitive for Named Entity Recognition. Our research team has developed CasEN, a symbolic system based on finite state transducers, which achieved promising results during the Ester2 French-speaking evaluation campaign. Despite these encouraging results, manually extending the coverage of such a hand-crafted system is a difficult task. In this paper, we present a novel approach based on pattern mining for NER and to supplement our system’s knowledge base. The system, mXS, exhaustively searches for hierarchical sequential patterns, that aim at detecting Named Entity boundaries. We assess their efficiency by using such patterns in a standalone mode and in combination with our existing system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Each set of markers is mapped to a predetermined number of corresponding markers sequences, e.g. \(P(\{m_1, m_2\}) = P(<m_1, m_2>) = P(<m_2, m_1>) = P(<m_1, m_2, m_1>)\).
2.
It limits the search space by considering at any position N most probable solutions.
3.
With regularization parameter C = 4.

References

Marsh, E., Perzanowski, D.: Muc-7 evaluation of ie technology: overview of results. In: Proceedings of the 7th Message Understanding Conference (MUC-7) (1998)
Google Scholar
Galliano, S., Gravier, G., Chaubard, L.: The ester 2 evaluation campaign for the rich transcription of French radio broadcasts. In: 10th Conference of the International Speech Communication Association (INTERSPEECH’2009) (2009)
Google Scholar
Voorhees, E.M., Harman, D.: International Speech Communication Association (INTERSPEECH’09) (2009)
Google Scholar
Friburger, N., Maurel, D.: Finite-state transducer cascades to extract named entities in texts. Theor. Comput. Sci. (TCS) 313, 93–104 (2004)
Article MATH MathSciNet Google Scholar
McDonald, D.D.: Internal and external evidence in the identification and semantic categorization of proper names. In: Boguraev, B., Pustejovsky, J. (eds.) Corpus Processing for Lexical Acquisition, pp. 21–39. MIT Press, Cambridge (1996)
Google Scholar
Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL’1999) (1999)
Google Scholar
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: 6th Workshop on Very Large Corpora (WVLC’1998) (1998)
Google Scholar
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: 13th Conference on Computational Natural Language Learning (CONLL’2003) (2003)
Google Scholar
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)
Article Google Scholar
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: European Conference on Artificial Intelligence (ECAI’00) - Workshop on Machine Learning for Information Extraction, Berlin, Germany (2000)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165, 91–134 (2005)
Article Google Scholar
Nouvel, D., Antoine, J.Y., Friburger, N., Soulet, A.: Recognizing named entities using automatically extracted transduction rules. In: Language and Technology Conference (LTC’11) (2011)
Google Scholar
Nouvel, D.: Reconnaissance des entités nommées par exploration de régles d’annotation. Ph.D. thesis (2012)
Google Scholar
Bouchou, B., Maurel, D.: Prolexbase et lmf : vers un standard pour les ressources lexicales sur les noms propres. Traitement Automatique des Langues (TAL) 49, 61–88 (2008)
Google Scholar
Nouvel, D., Antoine, J.Y., Friburger, N., Maurel, D.: An analysis of the performances of the casen named entities recognition system in the ester2 evaluation campaign. In: 7th International Language Resources and Evaluation (LREC’2010) (2010)
Google Scholar
Galibert, O., Rosset, S., Grouin, C., Zweigenbaum, P., Quintard, L.: Structured and extended named entity evaluation in automatic speech transcriptions. In: International Joint Conference on Natural Language Processing (IJCNLP’11) (2011)
Google Scholar
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: 2nd International Conference on New Methods in Language Processing (NEMLP’1994) (1994)
Google Scholar
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. In: Data Mining and Knowledge Discovery (DMKD). vol. 1, pp. 259–289 (1997)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MATH MathSciNet Google Scholar
Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R.: Performance measures for information extraction. In: DARPA Broadcast News Workshop, pp. 249–252 (1994)
Google Scholar
Nouvel, D., Antoine, J.Y., Friburger, N., Soulet, A.: Coupling knowledge-based and data-driven systems for named entity recognition. In: Innovative Hybrid Approaches to the Processing of Textual Data (HYBRID’12, EACL Workshop) (2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Laboratoire d’Informatique, Université François Rabelais Tours, 3, Place Jean Jaures, 41000, Blois, France
Damien Nouvel, Jean-Yves Antoine & Nathalie Friburger

Authors

Damien Nouvel
View author publications
You can also search for this author in PubMed Google Scholar
Jean-Yves Antoine
View author publications
You can also search for this author in PubMed Google Scholar
Nathalie Friburger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Damien Nouvel .

Editor information

Editors and Affiliations

Adam Mickiewicz University, Poznań, Poland
Zygmunt Vetulani
IMMI-CNRS, Orsay, France
Joseph Mariani

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nouvel, D., Antoine, JY., Friburger, N. (2014). Pattern Mining for Named Entity Recognition. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_19

Download citation

DOI: https://doi.org/10.1007/978-3-319-08958-4_19
Published: 26 July 2014
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08957-7
Online ISBN: 978-3-319-08958-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics