Abstract
This paper presents a hybrid method for automatic stress prediction that we apply to GLAFF-IT, a large-scale Italian lexicon we extracted from GLAW-IT, a Machine-Readable Dictionary grounded on Wikizionario. Our approach combines heuristic rules and a logistic model trained on the words’ sets of phonological features. This model reaches a 98.1% accuracy. The resulting resource is a large lexicon for the Italian language that we release under a free licence. It includes morphological and phonological information for each of its 457,702 entries. As of today, it is the only Italian lexicon featuring both large coverage and indication of stress position.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available at: http://redac.univ-tlse2.fr/lexicons/glawit.html.
- 2.
GLAFF-IT is freely available at: http://redac.univ-tlse2.fr/lexicons/glaffit.html.
References
Bafile, L.: Antepenultimate stress in italian and some related dialects: Metrical and prosodic aspects. Rivista di linguistica 11(2), 201–229 (1999)
Behravan, H., Hautamäki, V., Siniscalchi, S.M., Kinnunen, T., Lee, C.H.: i-Vector modeling of speech attributes for automatic foreign accent recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 24(1), 29–41 (2016)
Bel, N., Busa, F., Calzolari, N., Gola, E., Lenci, A., Monachini, M., Ogonowski, A., Peters, I., Peters, W., Ruimy, N., Villegas, M., Zampolli, A.: SIMPLE: a general framework for the development of multilingual lexicons. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece (2000)
Bertinetto, P.M., Burani, C., Laudanna, A., Lucia Marconi, D.R., Rolando, C.: Corpus e Lessico di Frequenza dell’Italiano Scritto (CoLFIS) (2005). http://linguistica.sns.it/CoLFIS/Home.htm
Calderone, B., Bertinetto, P.M.: From phonotactics to syllables. a psycho - computational approach. In: 46th Annual Meeting of the Societas Linguistica Europaea, Split, Croatia (2013)
Calderone, B., Sajous, F., Hathout, N.: GLAW-IT: a free large Italian dictionary encoded in a fine-grained XML format. In: Proceedings of the 49th Annual Meeting of the Societas Linguistica Europaea (SLE 2016), Naples, Italy, pp. 43–45 (2016)
Dou, Q., Bergsma, S., Jiampojamarn, S., Kondrak, G.: A ranking approach to stress prediction for letter-to-phoneme conversion. In: Proceedings of the 47th Annual Meeting of the ACL, pp. 118–126. Association for Computational Linguistics, Suntec, Singapore (2009)
Goslin, J., Galluzzi, C., Romani, C.: PhonItalia: a phonological lexicon for Italian. Behav. Res. Methods 46(3), 872–886 (2014)
Guion, S.G.: Knowledge of English word stress patterns in early and late Korean-English bilinguals. Stud. Second Lang. Acquisiti 27, 503–533 (2005)
Hathout, N., Sajous, F.: Wiktionnaire’s Wikicode GLAWIfied: a workable French machine-readable dictionary. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), Portorož, Slovenia (2016)
Krämer, M.: The Phonology of Italian. Oxford University Press, Oxford (2009)
Peperkamp, S.: Lexical exceptions in stress systems: arguments from early language acquisition and adult speech perception. Language 80(1), 98–126 (2004)
Rajman, M., Lecomte, J., Paroubek, P.: Grace GTR-3-2.1. Technical report, EPFL and INaLF (1997)
Roventini, A., Alonge, A., Calzolari, N., Magnini, B., Bertagna, F.: ItalWordNet: a large semantic database for Italian. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2000), Athens, Greece, pp. 783–790 (2000)
Sajous, F., Hathout, N.: GLAWI, a free XML-encoded machine-readable dictionary built from the French Wiktionary. In: Proceedings of the of the eLex 2015 Conference, Herstmonceux, England, pp. 405–426 (2015)
Slowiaczek, L.M.: Stress and context in auditory word recognition. J. Psycholinguist. Res. 20(6), 465–481 (1991)
Talamo, L., Celata, C., Bertinetto, P.M.: derIvaTario: an annotated lexicon of Italian derivatives. Word Struct. 9(1), 72–102 (2016)
Zweigenbaum, P., Grabar, N.: Accenting unknown words in a specialized language. In: Proceedings of the ACL 2002 Workshop on Natural Language Processing in the Biomedical Domain, Philadelphia, PA, pp. 21–28 (2002)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Calderone, B., Pascoli, M., Sajous, F., Hathout, N. (2017). Hybrid Method for Stress Prediction Applied to GLAFF-IT, a Large-Scale Italian Lexicon. In: Gracia, J., Bond, F., McCrae, J., Buitelaar, P., Chiarcos, C., Hellmann, S. (eds) Language, Data, and Knowledge. LDK 2017. Lecture Notes in Computer Science(), vol 10318. Springer, Cham. https://doi.org/10.1007/978-3-319-59888-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-59888-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59887-1
Online ISBN: 978-3-319-59888-8
eBook Packages: Computer ScienceComputer Science (R0)