Skip to main content

Study on Named Entity Recognition for Polish Based on Hidden Markov Models

  • Conference paper
Text, Speech and Dialogue (TSD 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6231))

Included in the following conference series:

  • 1476 Accesses

Abstract

Accuracy of a Named Entity Recognition algorithm based on the Hidden Markov Model is investigated. The algorithm was limited to recognition and classification of Named Entities representing persons. The algorithm was tested on two small Polish domain corpora of stock exchange and police reports. Comparison with the base lines algorithms based on the case of the first letter and a gazetteer is presented. The algorithm expressed 62% precision and 93% recall for the domain of the training data. Introduction of the simple hand-written post-processing rules increased precision up to 89%. We discuss also the problem of the method portability. A model of the combined knowledge sources is sketched also as a possible way to overcome the portability problem.

Work financed by European Union within Innovative Economy Programme project POIG.01.01.02-14-013/09.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Piskorski, J.: Extraction of Polish named entities. In: Proceedings of the Fourth International Conference on Language Resources and Evaluation, LREC 2004, pp. 313–316. ACL, Prague (2004)

    Google Scholar 

  2. Urbańska, D., Mykowiecka, A.: Multi-words Named Entity Recognition in Polish Texts. In: SLOVKO 2005 – Third International Seminar Computer Treatment of Slavic and East European Languages, Bratislava, Slovakia, pp. 208–215 (2005)

    Google Scholar 

  3. Mykowiecka, A., Kupść, A., Marciniak, M., Piskorski, J.: Resources for Information Extraction from Polish texts. In: Proceedings of the 3rd Language & Technology Conference (LTC 2007), Poznań, Poland (2007)

    Google Scholar 

  4. Graliński, F., Jassem, K., Marcińczuk, M., Wawrzyniak, P.: Named Entity Recognition in Machine Anonymization. In: Kłopotek, M.A., Przepiorkowski, A., Wierzchoń, A.T., Trojanowski, K. (eds.) Recent Advances in Intelligent Information Systems, pp. 247–260. Academic Publishing House Exit, San Diego (2009)

    Google Scholar 

  5. Graliński, F., Jassem, K., Marcińczuk, M.: An Environment for Named Entity Recognition and Translation. In: Màrquez, L., Somers, H. (eds.) Proceedings of the 13th Annual Conference of the European Association for Machine Translation, Barcelona, Spain, pp. 88–95 (2009)

    Google Scholar 

  6. Marcińczuk, M., Piasecki, M.: Pattern Extraction for Event Recognition in the Reports of Polish Stockholders. In: Proceedings of the Inter. Multiconference on Computer Science and Information Technology, Wisła, Poland, pp. 275–284 (2007)

    Google Scholar 

  7. Marcińczuk, M.: Pattern Acquisition Methods for Information Extraction Systems. Master thesis at Blekinge Tekniska Högskola, Sweden (2007)

    Google Scholar 

  8. Bikel, D. M., Miller, S., Schwartz, R., Weischedel, R.: Nymble: a High-Performance Learning Name-finder. In: Proceedings of Conference on Applied Natural Language Processing (1997)

    Google Scholar 

  9. Zhou, G., Su, J.: Named Entity Recognition using an HMM-based Chunk Tagger. In: ACL 2002: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 473–480 (2002)

    Google Scholar 

  10. Carpenter, B.: Character Language Models for Chinese Word Segmentation and Named Entity Recognition. In: Proceedings of the 5th ACL Chinese Special Interest Group (SIGHan), Sydney, Australia (2006)

    Google Scholar 

  11. Malouf, R.: Markov models for language-independent named entity recognition. In: Proceedings of the Sixth Conference on Natural Language Learning, pp. 183–186 (2002)

    Google Scholar 

  12. Alias-i, LingPipe 3.9.0 (October 1 2008), http://alias-i.com/lingpipe

  13. Marcińczuk, M., Piasecki, M.: Named Entity Recognition in the Domain of Polish Stock Exchange Reports. In: Kłopotek, M.A., Przepiórkowski, A., Wierzchoń, S.T., Trojanowski, K. (eds.) Intelligent Information Systems, Siedlce, pp. 127–140 (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Marcińczuk, M., Piasecki, M. (2010). Study on Named Entity Recognition for Polish Based on Hidden Markov Models. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds) Text, Speech and Dialogue. TSD 2010. Lecture Notes in Computer Science(), vol 6231. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15760-8_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-15760-8_19

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-15759-2

  • Online ISBN: 978-3-642-15760-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics