Skip to main content

Supervised and Unsupervised Learning of Arabic Morphology

  • Chapter
Arabic Computational Morphology

Part of the book series: Text, Speech and Language Technology ((TLTB,volume 38))

Abstract

The broken plural in Arabic is a canonical example of nonconcatenative morphology. We discuss the supervised and unsupervised learning of this type of transduction using different techniques, based on the use of stochastic transducers, trained with the Expectation-Maximisation algorithm. A basic method for supervised learning using the transducers is discussed and then a more advanced technique using a memory-based learning technique with a distance derived from the Fisher kernel of the model. We then discuss how these algorithms can be employed for unsupervised learning, modelling the alignment between the strings as a hidden variable

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Abe, N. & Warmuth, M. K. 1992. On the Computational Complexity of Approximating Distributions by Probabilistic Automata Machine Learning, 9, 205–260.

    Google Scholar 

  • Allison, L., Powell, D., Dix, T. I. 1999. Compression and Approximate Matching The Computer Journal, 42(1), 1–10.

    Article  Google Scholar 

  • Barvinok, A. I. 1999. Polynomial time algorithms to approximate permanents and mixed discriminants within a simple exponential factor Random Structures and Algorithms, 14, 29–61.

    Article  Google Scholar 

  • Baum, L. E. & Petrie, T. 1966. Statistical Inference for probabilistic functions of finite state Markov chains Annals of Mathematical Statistics, 37, 1559–1663.

    Google Scholar 

  • Beichl, I. & Sullivan, F. 1999. Approximating the Permanent via Importance Sampling with application to the dimer covering problem Journal of Computational Physics, 149(1), 128–147.

    Article  Google Scholar 

  • Bhatia, R. 1996. Matrix Analysis. Berlin: Springer Verlag.

    Google Scholar 

  • Bregman, L. M. 1967. Proof of Convergence of Sheleikhovskii’s method for a problem with transportation constraints Zh. vychsl. Mat. mat. Fiz., 147(7).

    Google Scholar 

  • Casacuberta, F. 1995. Probabilistic Estimation of Stochastic Regular Syntax-directed Translation Schemes In Proceedings of the VIth Spanish Symposium on Pattern Recognition and Image Analysis, 201–207.

    Google Scholar 

  • Casacuberta, F. & de la Higuera, C. 2000. Computational Complexity of Problems on Probabilistic Grammars and Transducers In Oliveira, A. L., Grammatical Inference: Algorithms and Applications, 15–24. Berlin: Springer Verlag.

    Google Scholar 

  • Clark, A. 2001. Learning Morphology with Pair Hidden Markov Models In Proc. of the Student Workshop at the 39th Annual Meeting of the Association for Computational Linguistics, 55–60 Toulouse, France.

    Google Scholar 

  • Clark, A. 2002. Memory-Based Learning of Morphology with Stochastic Transducers In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), 513–520.

    Google Scholar 

  • Clark, A. 2003. Combining Distributional and Morphological Information for Part of Speech Induction In Proceedings of the tenth Annual Meeting of the European Association for Computational Linguistics EACL 2003, 59–66.

    Google Scholar 

  • De Roeck, A. N. & Al-Fares, W. 2000. A Morphologically Sensitive Clustering Algorithm for Identifying Arabic Roots In COLING-2000, 199–206.

    Google Scholar 

  • Durbin, R., Eddy, S., Krogh, A., Mitchison, G. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  • Goldsmith, J. A. 2001. Unsupervised Learning of the Morphology of a Natural Language Computational Linguistics, 27(2), 153–198.

    Article  Google Scholar 

  • Jaakkola, T. S., Diekhans, M., Haussler, D. 2000. A discriminative framework for detecting remote protein homologies Journal of Computational Biology, 7(1,2), 95–114.

    Article  Google Scholar 

  • Jaakkola, T. & Haussler, D. 1999. Exploiting generative models in discriminative classifiers In Kearns, M. S., Solla, S. A., Cohn, D. A., Advances in Neural Information Processing Systems 11, 487–493. San Mateo, CA. Morgan Kauffmann Publishers.

    Google Scholar 

  • Kaplan, R. M. & Kay, M. 1994. Regular Models of Phonological Rule Systems Computational Linguistics, 20(3), 331–378.

    Google Scholar 

  • Kiraz, G. 1994. Multi-tape two-level morphology In COLING-94, 180–186.

    Google Scholar 

  • Koskenniemi, K. 1983. A Two-level Morphological Processor. Ph.D. thesis, University of Helsinki.

    Google Scholar 

  • Levenshtein, V. 1966. Binary codes capable of correcting deletions, insertions and reversals Soviet Physics Doklady, 10(8), 707–710.

    Google Scholar 

  • Ling, C. X. 1994. Learning the Past Tense of English Verbs: The Symbolic Pattern Associator vs. Connectionist Models Journal of Artifical Intelligence Research, 1, 209–229.

    Google Scholar 

  • McCarthy, J. & Prince, A. 1990. Foot and Word in prosodic morphology: The Arabic Broken Plural Natural Language and Linguistic Theory, 8, 209–284.

    Article  Google Scholar 

  • Mooney, R. J. & Califf, M. E. 1995. Induction of First-Order Decision Lists: Results on Learning the Past Tense of English Verbs Journal of Artificial Intelligence Research, 3, 1–24.

    Article  Google Scholar 

  • Plunkett, K. & Nakisa, R. C. 1997. A Connectionist Model of the Arabic Plural System Language and Cognitive Processes, 12(5/6), 807–836.

    Article  Google Scholar 

  • Ristad, E. S. & Yianilos, P. N. 1997. Finite Growth Models CS-TR-533-96, Department of Computer Science, Princeton University. revised in 1997.

    Google Scholar 

  • Rogati, M., McCarley, S., Yang, Y. 2003. Unsupervised learning of Arabic stemming using a parallel corpus In Proceedings of ACL, 391–398.

    Google Scholar 

  • Rumelhart, D. E. & McClelland, J. L. 1986. On Learning Past Tenses of English Verbs In Rumelhart, D. E. & McClelland, J. L., Parallel Distributed Processing, 2, 216–271. MIT Press, Cambridge, MA.

    Google Scholar 

  • Schone, P. & Jurafsky, D. 2000. Knowledge-free induction of Morphology using Latent Semantic Analysis In Proceedings of CoNLL-2000 and LLL-2000, 67–72. Lisbon, Portugal.

    Google Scholar 

  • Sinkhorn, R. 1964. A relation between arbitrary positive matrices and doubly stochastic matrices Annals of Mathematical Statistics, 35(2), 876–879.

    Google Scholar 

  • Soules, G. W. 1991. The rate of convergence of Sinkhorn Balancing Linear Algebra and Its Applications, 150(3), 3–40.

    Article  Google Scholar 

  • van den Bosch, A. & Daelemans, W. 1999. Memory-Based Morphological Analysis In Proceedings of the 37th Annual Meeting of the Association for Computational Linguistics, 285–292.

    Google Scholar 

  • Yarowsky, D. & Wicentowski, R. 2000. Minimally Supervised Morphological Analysis by Multimodal Alignment In Proceedings of ACL 2000, 207–216. Hong Kong.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer

About this chapter

Cite this chapter

Clark, A. (2007). Supervised and Unsupervised Learning of Arabic Morphology. In: Soudi, A., Bosch, A.v., Neumann, G. (eds) Arabic Computational Morphology. Text, Speech and Language Technology, vol 38. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-6046-5_10

Download citation

Publish with us

Policies and ethics