Abstract
All models strive to represent reality, and efforts in language research are no exception. Computational models of language acquisition must begin and end as an integral part of the empirical study of child language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We emphasize at the outset that the exploration of the input data only addresses part of the language acquisition problem. Linguistic studies have revealed many constraints on the syntactic system that are negative in nature, i.e., they specify the impossible forms of language. While theoretical formulations vary, the generalizations of island constraints, binding principles, etc. remain, and the acquisition studies of these constraints have been largely successful; see O’Grady (1997), Crain and Thornton (1998), Guasti (2002), etc. for reviews.
- 2.
For example, given the sentence “the cat chases the mouse”, its bigrams (n = 2) are “the cat”, “cat chases”, “chases the”, and “the mouse”, and its trigrams (n = 3) are “the cat chases”, “cat chases the”, “chases the mouse”. When n = 1, we are just dealing with words, or unigrams.
- 3.
Certain rules have been collapsed together as the Treebank frequently annotates rules involving distinct functional heads as separate rules.
- 4.
We put aside the issue whether (probabilistic) context free grammars are the right representation for natural language; there are well known arguments to the contrary.
- 5.
While this may be empirically true in a given corpus, long sentences can be perfectly grammatical and ought to be part of the learner’s linguistic knowledge – and they are, as any child familiar with The House that Jack built and other nursery rhymes knows well.
- 6.
In an interesting manipulation, Redington et al. explore the idea whether having complete knowledge of one category (e.g., nouns) contributes to the clustering of the other categories. This could correspond to some additional strategy by which the child could arrive at a syntactic category. Perhaps somewhat surprisingly, clustering quality actually diminishes. This suggests that even successful clustering of high-frequency words may not provide a sufficiently good bootstrapping device for the clustering of low frequency words. By contrast, children appear to be capable of using known syntactic categories to determine those of unknown words (Valian and Coulson 1988).
- 7.
Nevertheless, supervised learning has been used extensively in computational modeling of language acquisition, often without commentary on its suitability. For instance, virtually all models in the so-called past tense debate of morphology (e.g., Rumelhart and Mcclelland 1986) and those in the phonological learning of Optimality Theory (e.g., Tesar and Smolensky 2000) assume the learner has simultaneous access to paired input–out forms (e.g., “walk ® walked”, “drink ® drank”, or /dæm/ ® [dǽm], as in English vowel nasalization), though clearly the input to the child learner does not arrive in this pre-processed form. Learning the pairing is arguably the most challenging component of learning in these cases; see Chan (2008) for extensive discussion.
- 8.
The OV order parameter, therefore, must be set independently and prior to the V2 parameter by its own cue, e.g., a string where the object is followed by the past participle form of the verb, which indicates its base position before any movement.
- 9.
Certainly, admission that input plays a role in language acquisition – how else would the English learning child learn English and the Chinese learning child learn Chinese – does not mean that the input can account for all aspects of the child’s linguistic knowledge. This point seems obvious, though the persistent failure to even assess the role of the input points to a methodological deficiency in the current practice.
- 10.
References
Abney, S. 1996. Statistical methods and linguistics. In The balancing act, ed. P. Resnick and J. Klavins, 1–26. Cambridge, MA: MIT Press.
Angluin, D. 1980. Inductive inference of formal language from positive data. Information and Control 45(2):117–135.
Angluin, D. 1982. Inference of reversible languages. Journal of the ACM 29(3):741–765.
Angluin, D. 1988. Identifying languages from stochastic examples. Technical Report 614. New Haven: Yale University.
Angluin, D. 1992. Computational learning theory: Survey and selected bibliography. In Proceedings of the Twenty-Fourth Annual ACM Symposium on Theory of Computing, Victoria, 351–369.
Baker, M. 2002. Atoms of language. New York: Basic Books.
Baroni, M. 2008. Distributions in text. In Corpus linguistics: An international hanbook, ed. A. Lüdelign and M. Kytö. Berlin: Mouton de Gruyter.
Bates, E. 1976. Language and context: The acquisition of pragmatics. New York: Academic Press.
Berwick, R. 1985. The acquisition of syntactic knowledge. Cambridge, MA: MIT Press.
Berwick, R., and S. Pilato. 1987. Learning syntax by automata induction. Machine Learning 2(1):9–38.
Berwick, R., and P. Niyogi. 1996. Learning from triggers. Linguistic Inquiry 27:605–622.
Bikel, D. 2004. Intricacies of Collins’ parsing model. Computational Linguistics 30:479–511.
Bloom, P. 1993. Grammatical continuity in language development: The case of subjectless sentences. Linguistic Inquiry 24:721–34.
Blum, L., and M. Blum. 1975. Toward a mathematical theory of inductive inference. Information and Control 28(2):125–155.
Blumer, A., A. Ehrenfeucht, D. Haussler, and M. Warmuth. 1989. Learnability and the Vapnik-Chervonenkis dimension. Journal of the ACM 36(4):929–965.
Bod, R., J. Hay, and S. Jannedy. 2003. Probabilistic linguistics. Cambridge, MA: MIT Press.
Brill, E. 1995. Unsupervised learning of disambiguation rules for part of speech tagging. In Proceedings of the 3rd Workshop on Very Large Corpora, Cambridge, 1–13.
Brown, R. 1973. A first language. Cambridge, MA: Harvard University Press.
Bush, R., and F. Mosteller. 1951. A mathematical model for simple learning. Psychological Review 68:313–323.
Buttery, P., and A. Korhonen. 2005. Large-scale analysis of verb subcategorization differences between child directed speech and adult speech. In Proceedings of the Interdisci – Plinary Workshop on the Identification and Representation of Verb Features and Verb Classes. Saarbrucken: Saarland University.
Bybee, J. 2001. Phonology and language use. Cambridge: Cambridge University Press.
Chan, E. 2008. Structures and distributions in morphology learning. PhD diss., Department of Computer and Information Science, University of Pennsylvania, Philadelphia.
Charniak, E. 2000. A maximum-entropy-inspired parser. In Proceedings of the NAACL, Association for Computational Linguistics. Stroudsburg, PA, Seattle, 132–139.
Chomsky, N. 1955/1975. The logical structure of linguistic theory. Manuscript, Harvard/MIT. Published in 1975 by New York: Plenum.
Chomsky, N. 1957. Syntactic structures. Berlin/New York: Mouton.
Chomsky, N. 1959. Review of Verbal behavior by B. F. Skinner. Language 35(1):26–58.
Chomsky, N. 1965. Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Chomsky, N. 1975. Reflections on language. New York: Pantheon.
Chomsky, N. 1981. Lectures on government and binding. Dordrecht: Foris.
Chomsky, N. 1995. The minimalist program. Cambridge: MIT Press.
Cinque, G. 1999. Adverbs and functional heads. New York: Oxford University Press.
Clahsen, H. 1986. Verbal inflections in German child language: Acquisition of agreement markings and the functions they encode. Linguistics 24:79–121.
Clark, A. 2001. Unsupervised language acquisition: Theory and practice. PhD thesis, University of Sussex, UK.
Clark, A., and R. Eyraud. 2007. Polynomial identification in the limit of substitutable context-free languages. Journal of Machine Learning Research 8:1725–1745.
Collins, M. 2003. Head-driven statistical models for natural language processing. Computational Linguistics 29(4):589–637.
Crain, S., and M. Nakayama. 1987. Structure dependency in grammar formation. Language 63:522–543.
Crain, S., and R. Thornton. 1998. Investigations in universal grammar. Cambridge, MA: MIT Press.
Culicover, P. 1999. Syntactic nuts. New York: Oxford University Press.
Culicover, P., and R. Jackendoff. 2005. Simpler syntax. New York: Oxford University Press.
de Marcken, C. 1995. On the unsupervised induction of phrase-structure grammar. In Proceedings of the Third Workshop on Very Large Corpora, Cambridge, MA, 14–26.
Dresher, E. 1999. Charting the learning path: Cues to parameter setting. Linguistic Inquiry 30:27–67.
Dresher, E., and J. Kaye. 1990. A computational learning model for metrical phonology. Cognition 34:137–195.
Elman, J. 1990. Finding structure in time. Cognitive Science 14:179–211.
Feldman, J. 1997. The structure of perceptual categories. Journal of Mathematical Psychology 41:145–170.
Fodor, J. D. 1998. Unambiguous triggers. Linguistic Inquiry 29:1–36.
Fodor, J. D., and W. Sakas. 2005. The subset principle in syntax. Journal of Linguistics 41:513–569.
Fodor, J. D., and W. Sakas. 2009. Disambiguating syntactic triggers. Paper given at workshop on input and syntactic acquisition, Irvine.
Gibson, E., and K. Wexler. 1994. Triggers. Linguistic Inquiry 25:355–407.
Gold, M. 1967. Language identification in the limit. Information and Control 10:447–74.
Goldberg, E. 2003. Constructions. Trends in Cognitive Science 7:219–224.
Goldwater, S., and T. Griffiths. 2007. A fully Bayesian approach to unsupervised part-of-speech tagging. In Proceedings of the Association for Computational Linguistics. Stroudsburg, PA.
Goodman, N. 1955. Fact, fiction and forecast. Cambridge, MA: Harvard University Press.
Grinstead, J. 1994. Consequences of the maturation of number morphology in Spanish and Catalan. M.A. thesis, Los Angeles: UCLA.
Gross, M. 1975. Méthodes en syntaxe. Paris: Hermann.
Guasti, M. T. 2002. Language acquisition: The growth of grammar. Cambridge, MA: MIT Press.
Ha, Le Quan, E. I. Sicilia-Garcia, J. I. Ming, and F. J. Smith. 2002. Extension of Zipf’s law to words and phrases. In Proceedings of the 19th International Conference on Computational Linguistics, 315–320. Taipei: Howard International House.
Haegeman, L. 1995. Root infinitives, tense, and truncated structures. Language Acquisition 4:205–55.
Harris, Z. 1951. Methods in structural linguistics. Chicago: Chicago University Press.
Heckerman, D., D. Geiger, and M. Chickering. 1995. Learning Bayesian networks: The combination of knowledge and statistical data. Machine learning 20:197–243.
Horning, J. 1969. A study of grammatical inference. Doctoral dissertation, Department of Computer Science, Stanford University, Stanford.
Hudson Kam, C., and E. Newport. 2005. Regularizing unpredictable variation: The roles of adult and child learners in language formation and change. Language Learning and Development 1(2):151–195.
Hyams, N. 1986. Language acquisition and the theory of parameters. Dordrecht: Reidel.
Hyams, N. 1991. A reanalysis of null subjects in child language. In Theoretical issues in language acquisition: Continuity and change in development, ed. J. Weissenborn, H. Goodluck, and T. Roeper, 249–267. Hillsdale: L. Erlbaum Associates.
Hyams, N., and K. Wexler. 1993. On the grammatical basis of null subjects in child language. Linguistic Inquiry 24:421–459.
Jackendoff, R. 1977. X syntax: A study in phrase structure. Cambridge, MA: MIT Press.
Kam, X., I. Stoyneshka, L. Tornyova, J. D. Fodor, and W. Sakas. 2008. Bigrams and the richness of the stimulus. Cognitive Science 32:771–787.
Kearns, M., and L. Valiant. 1994. Cryptographic limitations on learning Boolean formulae and finite automata. Journal of the ACM 41:67–95.
Kohl, K. 1999. An analysis of finite parameter learning in linguistic spaces. Master’s thesis, Massachusetts Institute of Technology.
Kučera, H., and N. Francis. 1967. Computational analysis of present-day English. Providence: Brown University Press.
Legate, J. A., and C. Yang. 2002. Empirical reassessments of poverty stimulus arguments. Linguistic Review 19:151–162.
Legate, J. A., and C. Yang. 2007. Morphosyntactic learning and the development of tense. Language Acquisition 14:315–344.
Lewis, J., and J. Elman. 2001. Learnability and the statistical structure of language: Poverty of stimulus arguments revisited. In Proceedings of the 26th Annual Boston University Conference on Language Development, 359–370. Somerville: Cascadilla.
Lightfoot, D. 1999. The development of language: Acquisition, change, and evolution. Oxford: Blackwell.
MacWhinney, B. 2000. The CHILDES project. Mahwah: Lawrence Erlbaum.
MacWhinney, B. 2004. A multiple process solution to the logical problem of language acquisition. Journal of Child Language 31:883–914.
Magerman, D., and M. Marcus. 1990. Parsing a natural language using mutual information statistics. Proceedings of the AAAI 2:984–989.
Manning, C., and H. Schütz. 1999. Foundations of statistical natural language processing. Cambridge, MA: MIT Press.
Manzini, M. R., and K. Wexler. 1987. Parameters, binding theory, and learnability. Linguistic Inquiry 18:413–444.
Maratsos, M., and M. A. Chalkley. 1980. The internal language of childrens syntax: The ontogenesis and representation of syntactic categories. In Childrens language, ed. K. Nelson, vol. 2. New York: Gardner Press.
Marcus, G. 1993. Negative evidence in language acquisition. Cognition 46:53–85.
Marcus, M., M. Marcinkiewicz, and B. Santorini. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19:313–330.
McClelland, J. 2009. The place of modeling in cognitive science. Topics in Cognitive Science 1:11–38.
Merialdo, B. 1994. Tagging English text with a probabilistic model. Computational Linguistics 20(2):155–172.
Mintz, T., E. Newport, and T. Bever. 2002. The distributional structure of grammatical categories in speech to young children. Cognitive Science 26:393–424.
Morgan, J., and K. Demuth. 1996. From signal to syntax. Mahwah: Lawrence Erlbaum.
Niyogi, P. 2006. The computational nature of language learning and evolution. Cambridge, MA: MIT Press.
Nowak, M., N. Komarova, and P. Niyogi. 2002. Computational and evolutionary aspects of language. Nature 417:611–617.
O’Grady, W. 1997. Syntactic development. Chicago: University of Chicago Press.
O’Neil, W., and M. Honda. 2004. Understanding first and second language acquisition (Handbook 10). In Awakening our languages, ed. Lizette Peter. Santa Fe: Indigenous Language Institute.
Osherson, D., M. Stob, and S. Weinstein. 1985. Systems that learn. Cambridge, MA: MIT Press.
Pereira, F., and Y. Schabes. 1992. Inside–outside reestimation from partially bracketed corpora. In Proceedings of the ACL 1:128–135.
Perfors, A., J. Tenenbaum, and T. Regier. 2006. Poverty of the stimulus? a rational approach. In Proceedings of the 28th Annual Conference of the Cognitive Science Society, Vancouver.
Phillips, C. 1995. Syntax at age 2: Cross-linguistic differences. In MIT Working Papers In Linguistics, vol. 26, 325–382. Cambridge, MA: MITWPL.
Pierce, A. 1992. Language acquisition and syntactic theory. Boston: Kluwer.
Pinker, S. 1979. Formal models of language learning. Cognition 7(3):217–283.
Pinker, S. 1984. Language learnability and language development. Cambridge, MA: Harvard University Press.
Prince, A., and P. Smolensky. 1993. Optimality theory: Constraint interaction in generative grammar. Technical Report RuCCS-TR-2. New Brunswick: Rutgers University Center for Cognitive Science.
Rasetti, L. 2003. Optional categories in early French syntax: A developmental study of root infinitives and null arguments. Doctoral diss., Université de Genève, Switzerland.
Reali, F., and M. H. Christiansen. 2005. Uncovering the richness of the stimulus: Structure dependence and indirect statistical evidence. Cognitive Science 29:1007–1028.
Redington, M., N. Chater, and S. Finch. 1998. Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science 22(4):425–469.
Rissanen, J. 1989. Stochastic complexity in statistical inquiry. Singapore: World Scientific.
Roeper, T., and E. Williams. 1987. Parameter setting. Dordrecht: Kluwer.
Rumelhart, D., and J. Mcclelland. 1986. On learning the past tenses of English verbs: Implicit rules or parallel distributed processing? In Parallel distributed processing: Explorations in the microstructure of cognition, ed. J. McCelland, D. Rumelhart, & the PDP Research Group, 216–271. Cambridge, MA: MIT Press.
Saffran, J. 2001. The use of predictive dependencies in language learning. Journal of Memory and Language 44(4):493–515.
Saffran, J., R. Aslin, and E. Newport. 1996. Statistical learning by 8-month-olds. Science 274:1926–1928.
Sag, I. 2010. English filler-gap constructions. Language 86:486–545.
Sakas, W., and J. D. Fodor. 2001. Structural trigger learners. In Parametric linguistics and learnability: A self-contained tutorial for linguists, ed. S. Bertolo, 228–290. Oxford: Oxford University Press.
Schütz, H. 1995. Distributional part-of-speech tagging. In Proceedings of the European Chapter of the Association for Computational Linguistics (EACL), University College Dublin, Belfield.
Smith, N., and J. Eisner. 2005. Contrastive estimation: Training log-linear models on unlabeled data. In Proceedings of the Association for Computational Linguistics, University of Michigan, Ann Arbor.
Straus, K. 2008. Validations of a probabilistic model of language learning. Ph.D. Dissertation. Department of Mathematics, Northeastern University, Boston, MA.
Sutton, R., and A. Barto. 1998. Reinforcement learning. Cambridge, MA: MIT Press.
Teahan, W. J. 1997. Modeling English text. DPhil thesis, University of Waikato, New Zealand.
Tenenbaum, J., and T. Griffiths. 2001. Generalization, similarity, and Bayesian inference. Behavioral and Brain Sciences 24:629–640.
Tesar, B., and P. Smolensky. 2000. Learnability in optimality theory. Cambridge, MA: MIT Press.
Thompson, S., and E. Newport. 2007. Statistical learning of syntax: The role of transitional probability. Language Learning and Development 3(1):1–42.
Thornton, R., and S. Crain. 1994. Successful cyclic movement. In Language acquisition studies in generative grammar, ed. Hoekstra, T. and B. Schwartz, 215–253. Amsterdam/Philadelphia: Johns Benjamins.
Tomasello, M. 1992. First verbs: A case study of early grammatical development. Cambridge, MA: Harvard University Press.
Tomasello, M. 2003. Constructing a language. Cambridge, MA: Harvard University Press.
Valian, V. 1991. Syntactic subjects in the early speech of American and Italian children. Cognition 40:21–82.
Valian, V., and Coulson, S. 1988. Anchor points in language learning: The role of marker frequency. Journal of Memory and Language 27:71–86.
Valian, V., S. Solt, and J. Stewart. 2008. Abstract categories or limited-scope formulae? The case of childrens determiners. Journal of Child Language 35:1–36.
Valiant, L. 1984. A theory of the learnable. Communications of the ACM 27:1134–1142.
Vallabha, G., J. McClelland, F. Pons, J. Werker, and S. Amano. 2007. Unsupervised learning of vowel categories from infant-directed speech. Proceedings of the National Academy of Sciences 104(33):13273–13278.
Vapnik, V. 1995. The nature of statistical learning theory. Berlin: Springer.
Vapnik, V., and A. Chervonenkis. 1971. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and Applications 17:264–280.
Wang, Q., D. Lillo-Martin, C. Best, and A. Levitt. 1992. Null subject vs. null object: Some evidence from the acquisition of Chinese and English. Language Acquisition 2:221–254.
Wexler, K. 1998. Very early parameter setting and the unique checking constraint: A new explanation of the optional infinitive stage. Lingua 106:23–79.
Wexler, K., and P. Culicover. 1980. Formal principles of language acquisition. Cambridge, MA: MIT Press.
Yang, C. 2002. Knowledge and learning in natural language. Oxford: Oxford University Press.
Yang, C. 2004. Universal grammar, statistics, or both. Trends in Cognitive Sciences 8:451–456.
Yang, C. 2006. The infinite gift. New York: Scribner.
Yang, C. 2011. A statistical test for grammar. In Proceedings of the Workshop on Cognitive Modeling and Computational Linguistics. Association for Computational Linguistics. Stroudsburg, PA.
Zipf, G. K. 1949. Human behavior and the principle of least effort: An introduction to human ecology. Cambridge, MA: Addison-Wesley.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer Science+Business Media B.V.
About this chapter
Cite this chapter
Yang, C. (2011). Computational Models of Language Acquisition. In: de Villiers, J., Roeper, T. (eds) Handbook of Generative Approaches to Language Acquisition. Studies in Theoretical Psycholinguistics, vol 41. Springer, Dordrecht. https://doi.org/10.1007/978-94-007-1688-9_4
Download citation
DOI: https://doi.org/10.1007/978-94-007-1688-9_4
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-007-1687-2
Online ISBN: 978-94-007-1688-9
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)