Skip to main content

Detection of Verb Frames with NooJ

  • Conference paper
  • First Online:
  • 367 Accesses

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 667))

Abstract

This paper deals with semi-automatic extension of CroDeriV with verb valency frames. CroDeriV is a morphological database of Croatian verbs. In its present shape the database comprises 14 500 verbs in infinitive forms. Each verb in CroDeriV is segmented into lexical and derivational morphemes and verbs of the same root are mutually linked. In order to further enrich the CroDeriV with semantic and syntactic information, we have used the NooJ platform to recognize derivationally related verbs, find the verb frames and to speed up the sentence processing.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    CroDeriV resembles databases like CatVar for English and Uni-morph for Russian (http://courses.washington.edu/unimorph and http://clipdemos.umiacs.umd.edu/catvar).

  2. 2.

    There are approximately 120 verbal compounds recorded in CroDeriV.

  3. 3.

    A more detailed account is given in [6].

  4. 4.

    It will take even longer if the new paradigm is needed to be defined in the NOF file.

  5. 5.

    Where existing means previously built by the inflectional grammar of a dictionary entry.

  6. 6.

    If we observe language as a living thing, it is quite expected that some new derivations will appear in time.

  7. 7.

    In this paper, we will refer to the selection of main verbs used in this research as the root verb and they are marked as TYPE = “ROOT”, while the derivations of these main verbs are marked as TYPE = “NOVIx”. All the other types of verb that may appear in the sentence are only marked as an <VP> chunk with no additional attributes.

References

  1. Agić, Ž., Tadić, M., Dovedan, Z.: Evaluating full lemmatization of croatian texts. In: Recent Advances in Intelligent Information Systems, pp. 175–184. Academic Publishing House EXIT, Warsaw (2009)

    Google Scholar 

  2. Anić, V.: Rječnik hrvatskoga jezika. Novi liber, Zagreb (2004)

    Google Scholar 

  3. Bekavac, B.; Šojat, K.: Syntactic patterns of verb definitions in Croatian WordNet. In: Vučković, K., Bekavac, B., Silberztein, M (eds.) International Conference on Formalising Natural Languages with NooJ. Selected Papers from the NooJ 2011, pp. 112–121. Cambridge Scholars Publishing, Newcastle (2011)

    Google Scholar 

  4. Ljubešić, N., Erjavec, T.: hrWaC and slWac: compiling Web Corpora for Croatian and slovene. In: Habernal, I., Matoušek, V. (eds.) TSD 2011. LNCS (LNAI), vol. 6836, pp. 395–402. Springer, Heidelberg (2011). doi:10.1007/978-3-642-23538-2_50

    Chapter  Google Scholar 

  5. Silberztein, M.: The NooJ Manual (2003). www.nooj4nlp.net

  6. Šojat, K., Srebačić, M., Štefanec, V.: CroDeriV i morfološka raščlamba hrvatskoga glagola. Suvremena lingvistika 39(75), 75–96 (2013). Zagreb

    Google Scholar 

  7. Šojat, K., Srebačić, M., Pavelić, T., Tadić, M.: CroDeriV: a new resource for processing Croatian morphology. In: Calzolari, N., Choukri, K., Declerck, T., Loftsson, H., Maegaard, B., Mariani, J., Moreno, A., Odijk, J., Piperidis, S. (eds.) Proceedingx of the 9th International Conference on Language Resources and Evaluation, pp. 3366–3370. Reykjavik, Iceland (2014)

    Google Scholar 

  8. Šojat, K., Vučković, K., Tadić, M. Extracting verb valency frames with NooJ. In: Ben Hamadou, A., Mesfar, S., Silberztein, M. (eds.). International Conference and Workshop on Finite State Language Engineering, NooJ 2009, pp. 231–242. Centre de Publication Universitaire, Sfax, Tunisia (2010)

    Google Scholar 

  9. Šonje, J. (ed.): Rječnik hrvatskoga jezika. Leksikografski zavod. Miroslav Krleža & Školska knjiga, Zagreb (2000)

    Google Scholar 

  10. Tadić, M.: New version of the Croatian National Corpus. In: After Half a Century of Slavonic Natural Language Processing, pp. 199–209. Masaryk University, Brno (2009)

    Google Scholar 

  11. Vučković, K., Tadić, M., Bekavac, B.: Croatian language resources for NooJ. CIT J. Comput. Inf. Technol. 18, 295–301 (2010). Zagreb

    Article  Google Scholar 

  12. Vučković, K.: Model parsera za hrvatski jezik. Ph.D. dissertation. Faculty of Humanities and Social Sciences, Zagreb (2009)

    Google Scholar 

  13. Vučković, K., Mikelić Preradović, N., Dovedan, Z.: Verb valency enhanced Croatian Lexicon. In: Judit, K., Silberztein, M., Varadi, T. (eds.): International Conference on Selected Papers from the NooJ 2008, pp. 52–59. Cambridge Scholars Publishing, Newcastle (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Krešimir Šojat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Šojat, K., Bekavac, B., Kocijan, K. (2016). Detection of Verb Frames with NooJ. In: Barone, L., Monteleone, M., Silberztein, M. (eds) Automatic Processing of Natural-Language Electronic Texts with NooJ. NooJ 2016. Communications in Computer and Information Science, vol 667. Springer, Cham. https://doi.org/10.1007/978-3-319-55002-2_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-55002-2_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-55001-5

  • Online ISBN: 978-3-319-55002-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics