Skip to main content

Automatic Construction of a Morphological Dictionary of Multi-Word Units

  • Conference paper
Advances in Natural Language Processing (NLP 2010)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6233))

Included in the following conference series:

Abstract

The development of a comprehensive morphological dictionary of multi-word units for Serbian is a very demanding task, due to the complexity of Serbian morphology. Manual production of such a dictionary proved to be extremely time-consuming. In this paper we present a procedure that automatically produces dictionary lemmas for a given list of multi-word units. To accomplish this task the procedure relies on data in e-dictionaries of Serbian simple words, which are already well developed. We also offer an evaluation of the proposed procedure on several different sets of data. Finally, we discuss some implementation issues and present how the same procedure is used for other languages.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Courtois, B., Silberztein, M.: Dictionnaires électroniques du français. Larousse, Paris (1990)

    Google Scholar 

  2. Krstev, C.: Processing of Serbian - Automata, Texts and Electronic Dictionaries. Faculty of Philology, University of Belgrade, Belgrade (2008)

    Google Scholar 

  3. Savary, A.: Computational Inflection of Multi-Word Units - A Contrastive Study of Lexical Approaches. Linguistic Issues in Language Technologies 1(2) (2008)

    Google Scholar 

  4. Krstev, C., Vitas, D.: Finite State Transducers for Recognition and Generation of Compound Words. In: Erjavec, T., Žganec Gros, J. (eds.) IS-LTC 2006, Ljubljana, Slovenia, Institut Jožef Stefan, pp. 192–197 (October 2006)

    Google Scholar 

  5. Savary, A.: Multiflex: A Multilingual Finite-State Tool for Multi-Word Units. In: Maneth, S. (ed.) Implementation and Application of Automata. LNCS, vol. 5642, pp. 237–240. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  6. Krstev, C., Stanković, R., Vitas, D., Obradović, I.: The Usage of Various Lexical Resources and Tools to Improve the Performance of Web Search Engines. In: 6th LREC, Marrakech, Marocco (2008)

    Google Scholar 

  7. Jacquemin, C.: Spotting and Discovering Terms through Natural Language Processing. MIT Press, Cambridge (2001)

    Google Scholar 

  8. Laporte, E.: Lexicons and Grammars for Language Processing: Industrial or Handcrafted Products? In: Rezende, L.M., da Silva, B.C.D., Barbosa, J.B. (eds.) Léxico e gramática: dos sentidos à construção da significação. Trilhas Lingüísticas, vol. 16, pp. 51–84. Cultura Acadêmica, São Paulo (2009)

    Google Scholar 

  9. Krstev, C., Vitas, D., Savary, A.: Prerequisites for a Comprehensive Dictionary of Serbian Compounds. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 552–563. Springer, Heidelberg (2006)

    Chapter  Google Scholar 

  10. Savary, A.: Recensement et description des mots composés - méthodes et applications. PhD thesis, Université de Marne-la-Vallée (2000)

    Google Scholar 

  11. Courtois, B., Garrigues, M., Gross, G., Gross, M., Jung, R., Mathieu-Colas, M., Silberztein, M., Vivès, R.: Dictionnaire électronique des noms composés DELAC: les composants NA et NN. Technical Report 55, LADL, Université Paris 7 (1997)

    Google Scholar 

  12. Paumier, S.: Unitex 2.1 User Manual (2008), http://www-igm.univ-mlv.fr/unitex/UnitexManual2.1.pdf

  13. Wolinski, M., Savary, A., Sikora, P., Marciniak, M.: Usability Improvements in the Lexicographic Framework Toposlaw. In: Vetulani, Z. (ed.) 4th LTC, Poznań, Poland, IMPRESJA Widawnictwa Elektroniczne S.A (2009)

    Google Scholar 

  14. Grass, T., Maurel, D., Piton, O.: Description of a Multilingual Database of Proper Names. In: Ranchhod, E., Mamede, N.J. (eds.) PorTAL 2002. LNCS (LNAI), vol. 2389, pp. 137–140. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  15. Elia, A.: The Electronic Thematic Linguistic Atlases (Atlanti Linguistici Tematici Informatici - ALTI. In: Atlas DICoMP - Dizionario delle parole composite, http://www.ricercaitaliana.it/prin/unita_op_en-2005109535_003.htm

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Krstev, C., Stanković, R., Obradović, I., Vitas, D., Utvić, M. (2010). Automatic Construction of a Morphological Dictionary of Multi-Word Units. In: Loftsson, H., Rögnvaldsson, E., Helgadóttir, S. (eds) Advances in Natural Language Processing. NLP 2010. Lecture Notes in Computer Science(), vol 6233. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-14770-8_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-14770-8_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-14769-2

  • Online ISBN: 978-3-642-14770-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics