Abstract
The broken plural in Arabic is a canonical example of nonconcatenative morphology. We discuss the supervised and unsupervised learning of this type of transduction using different techniques, based on the use of stochastic transducers, trained with the Expectation-Maximisation algorithm. A basic method for supervised learning using the transducers is discussed and then a more advanced technique using a memory-based learning technique with a distance derived from the Fisher kernel of the model. We then discuss how these algorithms can be employed for unsupervised learning, modelling the alignment between the strings as a hidden variable
Preview
Unable to display preview. Download preview PDF.
References
Abe, N. & Warmuth, M. K. 1992. On the Computational Complexity of Approximating Distributions by Probabilistic Automata Machine Learning, 9, 205–260.
Allison, L., Powell, D., Dix, T. I. 1999. Compression and Approximate Matching The Computer Journal, 42(1), 1–10.
Barvinok, A. I. 1999. Polynomial time algorithms to approximate permanents and mixed discriminants within a simple exponential factor Random Structures and Algorithms, 14, 29–61.
Baum, L. E. & Petrie, T. 1966. Statistical Inference for probabilistic functions of finite state Markov chains Annals of Mathematical Statistics, 37, 1559–1663.
Beichl, I. & Sullivan, F. 1999. Approximating the Permanent via Importance Sampling with application to the dimer covering problem Journal of Computational Physics, 149(1), 128–147.
Bhatia, R. 1996. Matrix Analysis. Berlin: Springer Verlag.
Bregman, L. M. 1967. Proof of Convergence of Sheleikhovskii’s method for a problem with transportation constraints Zh. vychsl. Mat. mat. Fiz., 147(7).
Casacuberta, F. 1995. Probabilistic Estimation of Stochastic Regular Syntax-directed Translation Schemes In Proceedings of the VIth Spanish Symposium on Pattern Recognition and Image Analysis, 201–207.
Casacuberta, F. & de la Higuera, C. 2000. Computational Complexity of Problems on Probabilistic Grammars and Transducers In Oliveira, A. L., Grammatical Inference: Algorithms and Applications, 15–24. Berlin: Springer Verlag.
Clark, A. 2001. Learning Morphology with Pair Hidden Markov Models In Proc. of the Student Workshop at the 39th Annual Meeting of the Association for Computational Linguistics, 55–60 Toulouse, France.
Clark, A. 2002. Memory-Based Learning of Morphology with Stochastic Transducers In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 513–520.
Clark, A. 2003. Combining Distributional and Morphological Information for Part of Speech Induction In Proceedings of the tenth Annual Meeting of the European Association for Computational Linguistics EACL 2003, 59–66.
De Roeck, A. N. & Al-Fares, W. 2000. A Morphologically Sensitive Clustering Algorithm for Identifying Arabic Roots In COLING-2000, 199–206.
Durbin, R., Eddy, S., Krogh, A., Mitchison, G. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge, UK: Cambridge University Press.
Goldsmith, J. A. 2001. Unsupervised Learning of the Morphology of a Natural Language Computational Linguistics, 27(2), 153–198.
Jaakkola, T. S., Diekhans, M., Haussler, D. 2000. A discriminative framework for detecting remote protein homologies Journal of Computational Biology, 7(1,2), 95–114.
Jaakkola, T. & Haussler, D. 1999. Exploiting generative models in discriminative classifiers In Kearns, M. S., Solla, S. A., Cohn, D. A., Advances in Neural Information Processing Systems 11, 487–493. San Mateo, CA. Morgan Kauffmann Publishers.
Kaplan, R. M. & Kay, M. 1994. Regular Models of Phonological Rule Systems Computational Linguistics, 20(3), 331–378.
Kiraz, G. 1994. Multi-tape two-level morphology In COLING-94, 180–186.
Koskenniemi, K. 1983. A Two-level Morphological Processor. Ph.D. thesis, University of Helsinki.
Levenshtein, V. 1966. Binary codes capable of correcting deletions, insertions and reversals Soviet Physics Doklady, 10(8), 707–710.
Ling, C. X. 1994. Learning the Past Tense of English Verbs: The Symbolic Pattern Associator vs. Connectionist Models Journal of Artifical Intelligence Research, 1, 209–229.
McCarthy, J. & Prince, A. 1990. Foot and Word in prosodic morphology: The Arabic Broken Plural Natural Language and Linguistic Theory, 8, 209–284.
Mooney, R. J. & Califf, M. E. 1995. Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs Journal of Artificial Intelligence Research, 3, 1–24.
Plunkett, K. & Nakisa, R. C. 1997. A Connectionist Model of the Arabic Plural System Language and Cognitive Processes, 12(5/6), 807–836.
Ristad, E. S. & Yianilos, P. N. 1997. Finite Growth Models CS-TR-533-96, Department of Computer Science, Princeton University. revised in 1997.
Rogati, M., McCarley, S., Yang, Y. 2003. Unsupervised learning of Arabic stemming using a parallel corpus In Proceedings of ACL, 391–398.
Rumelhart, D. E. & McClelland, J. L. 1986. On Learning Past Tenses of English Verbs In Rumelhart, D. E. & McClelland, J. L., Parallel Distributed Processing, 2, 216–271. MIT Press, Cambridge, MA.
Schone, P. & Jurafsky, D. 2000. Knowledge-free induction of Morphology using Latent Semantic Analysis In Proceedings of CoNLL-2000 and LLL-2000, 67–72. Lisbon, Portugal.
Sinkhorn, R. 1964. A relation between arbitrary positive matrices and doubly stochastic matrices Annals of Mathematical Statistics, 35(2), 876–879.
Soules, G. W. 1991. The rate of convergence of Sinkhorn Balancing Linear Algebra and Its Applications, 150(3), 3–40.
van den Bosch, A. & Daelemans, W. 1999. Memory-Based Morphological Analysis In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, 285–292.
Yarowsky, D. & Wicentowski, R. 2000. Minimally Supervised Morphological Analysis by Multimodal Alignment In Proceedings of ACL 2000, 207–216. Hong Kong.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer
About this chapter
Cite this chapter
Clark, A. (2007). Supervised and Unsupervised Learning of Arabic Morphology. In: Soudi, A., Bosch, A.v., Neumann, G. (eds) Arabic Computational Morphology. Text, Speech and Language Technology, vol 38. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6046-5_10
Download citation
DOI: https://doi.org/10.1007/978-1-4020-6046-5_10
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-6045-8
Online ISBN: 978-1-4020-6046-5
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)