Skip to main content

Scaling an Irish FST Morphology Engine for Use on Unrestricted Text

  • Conference paper
Book cover Finite-State Methods and Natural Language Processing (FSMNLP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4002))

Abstract

This paper details the steps involved in scaling-up a lexicalised finite-state morphology transducer for use on unrestricted text. Our starting point was a base-line inflectional morphology engine [1], with 81% token coverage measured against a 15 million word corpus of Irish texts [2]. Manually scaling the FST lexicon component of a morphology transducer is time-consuming, expensive and rarely, if ever, complete. In order to scale up the engine we used a combination of strategies including semi-automatic population of the finite-state lexicon from machine-readable dictionary resources and from printed resources using optical character recognition, the addition of derivational morphology and the development of morphological guessers. This paper details the coverage increase contributed by each step. The full system achieves token coverage of 93% which is extended to 100% through the use of morphological guessers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Uí Dhonnchadha, E.: An analyser and generator for Irish inflectional morphology using finite state transducers. Master’s thesis, School of Computing, Dublin City University, Dublin, Ireland (2002)

    Google Scholar 

  2. ITÉ (accessed, November 2005), http://www.ite.ie/corpus/

  3. Beesley, K.R., Karttunen, L.: Finite State Morphology. CSLI Studies in Computational Linguistics. CSLI Publications (2003)

    Google Scholar 

  4. Karttunen, L., Beesley, K.R.: Two-level rule compiler. Technical report, Xerox PARC (1992)

    Google Scholar 

  5. Oideachais, A.R.: Foclóir Póca English-Irish/Irish-English Dictionary. An Gúm, Baile Átha Cliath (1986)

    Google Scholar 

  6. Symbols (accessed, November 2005), http://www.symbols.net/names/

  7. Uí Dhonnchadha, E., Nic Pháidín, C., Van Genabith, J.: Design, implementation and evaluation of an inflectional morphology finite-state transducer for Irish. MT - Machine Translation: Special Issue on Finite State Language Resources and Language Processing (in press)

    Google Scholar 

  8. Críostaí, B.: Graiméar Gaeilge na mBráithre Críostaí. An Gúm, Baile Átha Cliath (1999)

    Google Scholar 

  9. Ó Dónaill, N.: Foclóir Gaeilge Béarla. Oifig an tSoláthair, Baile Átha Cliath (1977)

    Google Scholar 

  10. Ó Droighneáin, M.: An Sloinnteoir Gaeilge agus an tAinmneoir. Coiscéim, Baile Átha Cliath (1991)

    Google Scholar 

  11. Ó Siochfhrada, N.: Foclóir Gaeilge/Béarla - Béarla/Gaeilge. An Comhlacht Oideachais, Baile Átha Cliath (1998)

    Google Scholar 

  12. Grefenstette, G., Schiller, A., Ait-Mokhtar, S.: Recognizing lexical patterns in text. In: van Eynde, F., Gibbon, D. (eds.) Lexicon Development for Speech and Language Processing. Kluwer Academic Publishers, Dordrecht (2000)

    Google Scholar 

  13. Kilgarriff, A., Rundell, M., Uí Dhonnchadha, E.: Efficient corpus creation for lexicography. Language Resources and Evaluation Journal (forthcoming)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Dhonnchadha, E.U., Van Genabith, J. (2006). Scaling an Irish FST Morphology Engine for Use on Unrestricted Text. In: Yli-Jyrä, A., Karttunen, L., Karhumäki, J. (eds) Finite-State Methods and Natural Language Processing. FSMNLP 2005. Lecture Notes in Computer Science(), vol 4002. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11780885_24

Download citation

  • DOI: https://doi.org/10.1007/11780885_24

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-35467-3

  • Online ISBN: 978-3-540-35469-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics