Skip to main content

Pattern Mining for Named Entity Recognition

  • Conference paper
  • First Online:
Human Language Technology Challenges for Computer Science and Linguistics (LTC 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8387))

Included in the following conference series:

Abstract

Many evaluation campaigns have shown that knowledge-based and data-driven approaches remain equally competitive for Named Entity Recognition. Our research team has developed CasEN, a symbolic system based on finite state transducers, which achieved promising results during the Ester2 French-speaking evaluation campaign. Despite these encouraging results, manually extending the coverage of such a hand-crafted system is a difficult task. In this paper, we present a novel approach based on pattern mining for NER and to supplement our system’s knowledge base. The system, mXS, exhaustively searches for hierarchical sequential patterns, that aim at detecting Named Entity boundaries. We assess their efficiency by using such patterns in a standalone mode and in combination with our existing system.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Each set of markers is mapped to a predetermined number of corresponding markers sequences, e.g. \(P(\{m_1, m_2\}) = P(<m_1, m_2>) = P(<m_2, m_1>) = P(<m_1, m_2, m_1>)\).

  2. 2.

    It limits the search space by considering at any position N most probable solutions.

  3. 3.

    With regularization parameter C = 4.

References

  1. Marsh, E., Perzanowski, D.: Muc-7 evaluation of ie technology: overview of results. In: Proceedings of the 7th Message Understanding Conference (MUC-7) (1998)

    Google Scholar 

  2. Galliano, S., Gravier, G., Chaubard, L.: The ester 2 evaluation campaign for the rich transcription of French radio broadcasts. In: 10th Conference of the International Speech Communication Association (INTERSPEECH’2009) (2009)

    Google Scholar 

  3. Voorhees, E.M., Harman, D.: International Speech Communication Association (INTERSPEECH’09) (2009)

    Google Scholar 

  4. Friburger, N., Maurel, D.: Finite-state transducer cascades to extract named entities in texts. Theor. Comput. Sci. (TCS) 313, 93–104 (2004)

    Article  MATH  MathSciNet  Google Scholar 

  5. McDonald, D.D.: Internal and external evidence in the identification and semantic categorization of proper names. In: Boguraev, B., Pustejovsky, J. (eds.) Corpus Processing for Lexical Acquisition, pp. 21–39. MIT Press, Cambridge (1996)

    Google Scholar 

  6. Mikheev, A., Moens, M., Grover, C.: Named entity recognition without gazetteers. In: 9th Conference of the European Chapter of the Association for Computational Linguistics (EACL’1999) (1999)

    Google Scholar 

  7. Borthwick, A., Sterling, J., Agichtein, E., Grishman, R.: Exploiting diverse knowledge sources via maximum entropy in named entity recognition. In: 6th Workshop on Very Large Corpora (WVLC’1998) (1998)

    Google Scholar 

  8. McCallum, A., Li, W.: Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons. In: 13th Conference on Computational Natural Language Learning (CONLL’2003) (2003)

    Google Scholar 

  9. Nadeau, D., Sekine, S.: A survey of named entity recognition and classification. Linguisticae Investigationes 30, 3–26 (2007)

    Article  Google Scholar 

  10. Freitag, D., Kushmerick, N.: Boosted wrapper induction. In: European Conference on Artificial Intelligence (ECAI’00) - Workshop on Machine Learning for Information Extraction, Berlin, Germany (2000)

    Google Scholar 

  11. Etzioni, O., Cafarella, M., Downey, D., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Unsupervised named-entity extraction from the web: an experimental study. Artif. Intell. 165, 91–134 (2005)

    Article  Google Scholar 

  12. Nouvel, D., Antoine, J.Y., Friburger, N., Soulet, A.: Recognizing named entities using automatically extracted transduction rules. In: Language and Technology Conference (LTC’11) (2011)

    Google Scholar 

  13. Nouvel, D.: Reconnaissance des entités nommées par exploration de régles d’annotation. Ph.D. thesis (2012)

    Google Scholar 

  14. Bouchou, B., Maurel, D.: Prolexbase et lmf : vers un standard pour les ressources lexicales sur les noms propres. Traitement Automatique des Langues (TAL) 49, 61–88 (2008)

    Google Scholar 

  15. Nouvel, D., Antoine, J.Y., Friburger, N., Maurel, D.: An analysis of the performances of the casen named entities recognition system in the ester2 evaluation campaign. In: 7th International Language Resources and Evaluation (LREC’2010) (2010)

    Google Scholar 

  16. Galibert, O., Rosset, S., Grouin, C., Zweigenbaum, P., Quintard, L.: Structured and extended named entity evaluation in automatic speech transcriptions. In: International Joint Conference on Natural Language Processing (IJCNLP’11) (2011)

    Google Scholar 

  17. Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: 2nd International Conference on New Methods in Language Processing (NEMLP’1994) (1994)

    Google Scholar 

  18. Mannila, H., Toivonen, H., Verkamo, A.I.: Discovery of frequent episodes in event sequences. In: Data Mining and Knowledge Discovery (DMKD). vol. 1, pp. 259–289 (1997)

    Google Scholar 

  19. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MATH  MathSciNet  Google Scholar 

  20. Makhoul, J., Kubala, F., Schwartz, R., Weischedel, R.: Performance measures for information extraction. In: DARPA Broadcast News Workshop, pp. 249–252 (1994)

    Google Scholar 

  21. Nouvel, D., Antoine, J.Y., Friburger, N., Soulet, A.: Coupling knowledge-based and data-driven systems for named entity recognition. In: Innovative Hybrid Approaches to the Processing of Textual Data (HYBRID’12, EACL Workshop) (2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Damien Nouvel .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Nouvel, D., Antoine, JY., Friburger, N. (2014). Pattern Mining for Named Entity Recognition. In: Vetulani, Z., Mariani, J. (eds) Human Language Technology Challenges for Computer Science and Linguistics. LTC 2011. Lecture Notes in Computer Science(), vol 8387. Springer, Cham. https://doi.org/10.1007/978-3-319-08958-4_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-08958-4_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-08957-7

  • Online ISBN: 978-3-319-08958-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics