Abstract
Many evaluation campaigns have shown that knowledge-based and data-driven approaches remain equally competitive for Named Entity Recognition. Our research team has developed CasEN, a symbolic system based on finite state transducers, which achieved promising results during the Ester2 French-speaking evaluation campaign. Despite these encouraging results, manually extending the coverage of such a hand-crafted system is a difficult task. In this paper, we present a novel approach based on pattern mining for NER and to supplement our system’s knowledge base. The system, mXS, exhaustively searches for hierarchical sequential patterns, that aim at detecting Named Entity boundaries. We assess their efficiency by using such patterns in a standalone mode and in combination with our existing system.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Each set of markers is mapped to a predetermined number of corresponding markers sequences, e.g. \(P(\{m_1, m_2\}) = P(<m_1, m_2>) = P(<m_2, m_1>) = P(<m_1, m_2, m_1>)\).
- 2.
It limits the search space by considering at any position N most probable solutions.
- 3.
With regularization parameter C = 4.
References
Marsh, E., Perzanowski, D.: Muc-7 evaluation of ie technology: overview of results. In: Proceedings of the 7th Message Understanding Conference (MUC-7) (1998)
Galliano, S., Gravier, G., Chaubard, L.: The ester 2 evaluation campaign for the rich transcription of French radio broadcasts. In: 10th Conference of the International Speech Communication Association (INTERSPEECH’2009) (2009)
Voorhees, E.M., Harman, D.: International Speech Communication Association (INTERSPEECH’09) (2009)
Friburger, N., Maurel, D.: Finite-state transducer cascades to extract named entities in texts. Theor. Comput. Sci. (TCS) 313, 93–104 (2004)
McDonald, D.D.: Internal and external evidence in the identification and semantic categorization of proper names. In: Boguraev, B., Pustejovsky, J. (eds.) Corpus Processing for Lexical Acquisition, pp. 21–39. MIT Press, Cambridge (1996)
Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL’1999) (1999)
Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: 6th Workshop on Very Large Corpora (WVLC’1998) (1998)
McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: 13th Conference on Computational Natural Language Learning (CONLL’2003) (2003)
Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)
Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: European Conference on Artificial Intelligence (ECAI’00) - Workshop on Machine Learning for Information Extraction, Berlin, Germany (2000)
Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165, 91–134 (2005)
Nouvel, D., Antoine, J.Y., Friburger, N., Soulet, A.: Recognizing named entities using automatically extracted transduction rules. In: Language and Technology Conference (LTC’11) (2011)
Nouvel, D.: Reconnaissance des entités nommées par exploration de régles d’annotation. Ph.D. thesis (2012)
Bouchou, B., Maurel, D.: Prolexbase et lmf : vers un standard pour les ressources lexicales sur les noms propres. Traitement Automatique des Langues (TAL) 49, 61–88 (2008)
Nouvel, D., Antoine, J.Y., Friburger, N., Maurel, D.: An analysis of the performances of the casen named entities recognition system in the ester2 evaluation campaign. In: 7th International Language Resources and Evaluation (LREC’2010) (2010)
Galibert, O., Rosset, S., Grouin, C., Zweigenbaum, P., Quintard, L.: Structured and extended named entity evaluation in automatic speech transcriptions. In: International Joint Conference on Natural Language Processing (IJCNLP’11) (2011)
Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: 2nd International Conference on New Methods in Language Processing (NEMLP’1994) (1994)
Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. In: Data Mining and Knowledge Discovery (DMKD). vol. 1, pp. 259–289 (1997)
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R.: Performance measures for information extraction. In: DARPA Broadcast News Workshop, pp. 249–252 (1994)
Nouvel, D., Antoine, J.Y., Friburger, N., Soulet, A.: Coupling knowledge-based and data-driven systems for named entity recognition. In: Innovative Hybrid Approaches to the Processing of Textual Data (HYBRID’12, EACL Workshop) (2012)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Nouvel, D., Antoine, JY., Friburger, N. (2014). Pattern Mining for Named Entity Recognition. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_19
Download citation
DOI: https://doi.org/10.1007/978-3-319-08958-4_19
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-08957-7
Online ISBN: 978-3-319-08958-4
eBook Packages: Computer ScienceComputer Science (R0)