Supervised and Unsupervised Learning of Arabic Morphology

Clark, Alexander

doi:10.1007/978-1-4020-6046-5_10

Alexander Clark¹⁴

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 38))

1060 Accesses
1 Citations

Abstract

The broken plural in Arabic is a canonical example of nonconcatenative morphology. We discuss the supervised and unsupervised learning of this type of transduction using different techniques, based on the use of stochastic transducers, trained with the Expectation-Maximisation algorithm. A basic method for supervised learning using the transducers is discussed and then a more advanced technique using a memory-based learning technique with a distance derived from the Fisher kernel of the model. We then discuss how these algorithms can be employed for unsupervised learning, modelling the alignment between the strings as a hidden variable

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abe, N. & Warmuth, M. K. 1992. On the Computational Complexity of Approximating Distributions by Probabilistic Automata Machine Learning, 9, 205–260.
Google Scholar
Allison, L., Powell, D., Dix, T. I. 1999. Compression and Approximate Matching The Computer Journal, 42(1), 1–10.
Article Google Scholar
Barvinok, A. I. 1999. Polynomial time algorithms to approximate permanents and mixed discriminants within a simple exponential factor Random Structures and Algorithms, 14, 29–61.
Article Google Scholar
Baum, L. E. & Petrie, T. 1966. Statistical Inference for probabilistic functions of finite state Markov chains Annals of Mathematical Statistics, 37, 1559–1663.
Google Scholar
Beichl, I. & Sullivan, F. 1999. Approximating the Permanent via Importance Sampling with application to the dimer covering problem Journal of Computational Physics, 149(1), 128–147.
Article Google Scholar
Bhatia, R. 1996. Matrix Analysis. Berlin: Springer Verlag.
Google Scholar
Bregman, L. M. 1967. Proof of Convergence of Sheleikhovskii’s method for a problem with transportation constraints Zh. vychsl. Mat. mat. Fiz., 147(7).
Google Scholar
Casacuberta, F. 1995. Probabilistic Estimation of Stochastic Regular Syntax-directed Translation Schemes In Proceedings of the VIth Spanish Symposium on Pattern Recognition and Image Analysis, 201–207.
Google Scholar
Casacuberta, F. & de la Higuera, C. 2000. Computational Complexity of Problems on Probabilistic Grammars and Transducers In Oliveira, A. L., Grammatical Inference: Algorithms and Applications, 15–24. Berlin: Springer Verlag.
Google Scholar
Clark, A. 2001. Learning Morphology with Pair Hidden Markov Models In Proc. of the Student Workshop at the 39th Annual Meeting of the Association for Computational Linguistics, 55–60 Toulouse, France.
Google Scholar
Clark, A. 2002. Memory-Based Learning of Morphology with Stochastic Transducers In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 513–520.
Google Scholar
Clark, A. 2003. Combining Distributional and Morphological Information for Part of Speech Induction In Proceedings of the tenth Annual Meeting of the European Association for Computational Linguistics EACL 2003, 59–66.
Google Scholar
De Roeck, A. N. & Al-Fares, W. 2000. A Morphologically Sensitive Clustering Algorithm for Identifying Arabic Roots In COLING-2000, 199–206.
Google Scholar
Durbin, R., Eddy, S., Krogh, A., Mitchison, G. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge, UK: Cambridge University Press.
Google Scholar
Goldsmith, J. A. 2001. Unsupervised Learning of the Morphology of a Natural Language Computational Linguistics, 27(2), 153–198.
Article Google Scholar
Jaakkola, T. S., Diekhans, M., Haussler, D. 2000. A discriminative framework for detecting remote protein homologies Journal of Computational Biology, 7(1,2), 95–114.
Article Google Scholar
Jaakkola, T. & Haussler, D. 1999. Exploiting generative models in discriminative classifiers In Kearns, M. S., Solla, S. A., Cohn, D. A., Advances in Neural Information Processing Systems 11, 487–493. San Mateo, CA. Morgan Kauffmann Publishers.
Google Scholar
Kaplan, R. M. & Kay, M. 1994. Regular Models of Phonological Rule Systems Computational Linguistics, 20(3), 331–378.
Google Scholar
Kiraz, G. 1994. Multi-tape two-level morphology In COLING-94, 180–186.
Google Scholar
Koskenniemi, K. 1983. A Two-level Morphological Processor. Ph.D. thesis, University of Helsinki.
Google Scholar
Levenshtein, V. 1966. Binary codes capable of correcting deletions, insertions and reversals Soviet Physics Doklady, 10(8), 707–710.
Google Scholar
Ling, C. X. 1994. Learning the Past Tense of English Verbs: The Symbolic Pattern Associator vs. Connectionist Models Journal of Artifical Intelligence Research, 1, 209–229.
Google Scholar
McCarthy, J. & Prince, A. 1990. Foot and Word in prosodic morphology: The Arabic Broken Plural Natural Language and Linguistic Theory, 8, 209–284.
Article Google Scholar
Mooney, R. J. & Califf, M. E. 1995. Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs Journal of Artificial Intelligence Research, 3, 1–24.
Article Google Scholar
Plunkett, K. & Nakisa, R. C. 1997. A Connectionist Model of the Arabic Plural System Language and Cognitive Processes, 12(5/6), 807–836.
Article Google Scholar
Ristad, E. S. & Yianilos, P. N. 1997. Finite Growth Models CS-TR-533-96, Department of Computer Science, Princeton University. revised in 1997.
Google Scholar
Rogati, M., McCarley, S., Yang, Y. 2003. Unsupervised learning of Arabic stemming using a parallel corpus In Proceedings of ACL, 391–398.
Google Scholar
Rumelhart, D. E. & McClelland, J. L. 1986. On Learning Past Tenses of English Verbs In Rumelhart, D. E. & McClelland, J. L., Parallel Distributed Processing, 2, 216–271. MIT Press, Cambridge, MA.
Google Scholar
Schone, P. & Jurafsky, D. 2000. Knowledge-free induction of Morphology using Latent Semantic Analysis In Proceedings of CoNLL-2000 and LLL-2000, 67–72. Lisbon, Portugal.
Google Scholar
Sinkhorn, R. 1964. A relation between arbitrary positive matrices and doubly stochastic matrices Annals of Mathematical Statistics, 35(2), 876–879.
Google Scholar
Soules, G. W. 1991. The rate of convergence of Sinkhorn Balancing Linear Algebra and Its Applications, 150(3), 3–40.
Article Google Scholar
van den Bosch, A. & Daelemans, W. 1999. Memory-Based Morphological Analysis In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, 285–292.
Google Scholar
Yarowsky, D. & Wicentowski, R. 2000. Minimally Supervised Morphological Analysis by Multimodal Alignment In Proceedings of ACL 2000, 207–216. Hong Kong.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Royal Holloway, University of London, Egham, Surrey TW20 0EX, United Kingdom
Alexander Clark

Authors

Alexander Clark
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Ecole Nationale de I’Industrie Minérale, Rabat, Morocco
Abdelhadi Soudi
Tilburg University, The Netherlands
Antal van den Bosch
Deutsches Forschungszentrum für Künstliche Intelligenz, Saarbrücken, Germany
Günter Neumann

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Clark, A. (2007). Supervised and Unsupervised Learning of Arabic Morphology. In: Soudi, A., Bosch, A.v., Neumann, G. (eds) Arabic Computational Morphology. Text, Speech and Language Technology, vol 38. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6046-5_10

Download citation

DOI: https://doi.org/10.1007/978-1-4020-6046-5_10
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-6045-8
Online ISBN: 978-1-4020-6046-5
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)

Publish with us

Policies and ethics