Abstract
A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Where do children learning their first languages begin in solving this problem? To experiment with different representations that children may use to begin understanding language, we have built a computational model for this early point in language acquisition. This system, Latent BabySRL, learns from transcriptions of natural child-directed speech and makes use of psycholinguistically plausible background knowledge and realistically noisy semantic feedback to improve both an intermediate syntactic representation and its final semantic role classification. Using this system we show that it is possible for a simple learner in a plausible (noisy) setup to begin comprehending the meanings of simple sentences, when initialized with a small amount of concrete noun knowledge and some simple syntax-semantics mapping biases, before acquiring any specific verb knowledge.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In our corpus the full set of role labels is: A0, A1, A2, A3, A4, AM-ADV, AM-CAU, AM-DIR, AM-DIS, AM-EXT, AM-LOC, AM-MNR, AM-MOD, AM-NEG, AM-PNC, AM-PRD, AM-PRP, AM-RCL, AM-TMP.
- 2.
Corpus, decision files and additional annotation information available at http.://cogcomp.cs.illinois.edu/~connor2/babySRL/
- 3.
- 4.
We tuned the priors using the same set of 8 value pairs suggested by Gao and Johnson [36], using a held out set of POS-tagged CDS to evaluate final performance. Our final values are an emission prior of 0.1 and a transitions prior of 0.0001; as a Dirichlet prior approaches 0 the resulting multinomial becomes peakier with most of the probability mass concentrated in a few points.
- 5.
We also include a small third class for punctuation, which is discarded.
- 6.
TO,IN,EX,POS,WDT,PDT,WRB,MD,CC,DT,RP,UH.
- 7.
Note that the data shown in Fig. 2 reflect HMM initialization and training that differed slightly from that described in Sect. 3.2.1 and used in the experiments reported here: In that previous work, the set of function words differed slightly (e.g., in the current version we added ‘not’ to the function word set, and removed ‘like’ and ‘have’), fewer states were allocated to punctuation (3 rather than 5), and the HMM was trained on a smaller sample of unlabeled text (up to 160,000 sentences rather than 320,000). The revised HMM parser used in the present experiments produced very similar results.
- 8.
This roughly represents phonological/distribution information that might be useful for clustering verbs together (e.g., [64]), but that is not exploited by our HMM because the HMM takes transcribed words as input.
- 9.
Because we focus on noun arguments, we miss those predicate arguments that do not include any nouns; the maximum SRL role F1 with only noun arguments correct is 0.8255.
References
Alishahi, A., & Stevenson, S. (2010). A computational model of learning semantic roles from child-directed language. Language and Cognitive Processes, 25(1), 50–93.
Alishahi, A., & Stevenson, S. (2012). Gradual acquisition of verb selectional prefences in a bayesian model. In A. Villavicencio, A. Alishahi, T. Poibeau, & A. Korhonen (Eds.), Cognitive aspects of computational language acquisition. New York: Springer.
Beal, M. J. (2003). Variational algorithms for approximate bayesian inference. Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London.
Bever, T. G. (1970). The cognitive basis for linguistic structures. In J. Hayes (Ed.), Cognition and the development of language (pp. 279–362). New York: Wiley.
Bloom, B. H. (1970). Space/time trade-offs in Hash coding with allowable errors. Communications of the ACM, 13(7), 422–426.
Bloom, L. (1973). One word at a time: The use of single-word utterances before syntax. The Hague: Mouton.
Bod, R. (2009). From exemplar to grammar: A probabilistic analogy-based model of language learning. Cognitive Science, 33(5), 752–793.
Brent, M. R., & Siskind, J. M. (2001). The role of exposure to isolated words in early vocabulary development. Cognition, 81, 31–44.
Brill, E. (1997). Unsupervised learning of disambiguation rules for part of speech tagging. In S. Armstrong, K. Church, P. Isabelle, S. Manzi, E. Tzoukermann, & D. Yarowsky (Eds.), Natural language processing using very large corpora. Dordrecht: Kluwer Academic Press.
Brown, R. (1973). A first language. Cambridge: Harvard University Press.
Brown, P., Pietra, V. D., deSouza, P., Lai, J., & Mercer, R. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18(4), 467–479.
Carreras, X., & Màrquez, L. (2004). Introduction to the CoNLL-2004 shared tasks: Semantic role labeling. In Proceedings of CoNLL-2004 (pp. 89–97), Boston.
Carreras, X., & Màrquez, L. (2005). Introduction to the CoNLL-2005 shared task: Semantic role labeling. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), Ann Arbor.
Chang, F., Dell, G. S., & Bock, K. (2006). Becoming syntactic. Psychological review, 113(2), 234–272.
Chang, M., Goldwasser, D., Roth, D., & Srikumar, V. (2010). Discriminative learning over constrained latent representations. In Proceedings of the Annual Meeting of the North American Association of Computational Linguistics (NAACL), Los Angeles.
Charniak, E. (1997). Statistical parsing with a context-free grammar and word statistics. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Providence.
Cherry, C., & Quirk, C. (2008). Discriminative, syntactic language modeling through latent svms. In Proceedings of the Eighth Conference of AMTA, Honolulu.
Clark, E. V. (1978). Awwareness of language: Some evidence from what children say and do. In R. J. A. Sinclair & W. Levelt (Eds.), The child’s conception of language. Berlin: Springer.
Clark, E. V. (1990). Speaker perspective in language acquisition. Linguistics, 28, 1201–1220.
Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In EMNLP, Philadelphia.
Connor, M., Gertner, Y., Fisher, C., & Roth, D. (2008). Baby srl: Modeling early language acquisition. In Proceedings of the Annual Conference on Computational Natural Language Learning (CoNLL), Manchester.
Connor, M., Gertner, Y., Fisher, C., & Roth, D. (2009). Minimally supervised model of early language acquisition. In Proceedings of the Annual Conference on Computational Natural Language Learning (CoNLL), Boulder.
Connor, M., Gertner, Y., Fisher, C., & Roth, D. (2010). Starting from scratch in semantic role labeling. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL), Uppsala.
Connor, M., Fisher, C., & Roth, D. (2011). Online latent structure training for language acquisition. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Barcelona.
Dale, P. S., & Fenson, L. (1996). Lexical development norms for young children. Behavior Research Methods, Instruments, & Computers, 28, 125–127.
Demetras, M., Post, K., & Snow, C. (1986). Feedback to first-language learners. Journal of Child Language, 13, 275–292.
Demuth, K., Culbertson, J., & Alter, J. (2006). Word-minimality, epenthesis, and coda licensing in the acquisition of english. Language & Speech, 49, 137–174.
Dowty, D. (1991). Thematic proto-roles and argument selection. Language, 67, 547–619.
Elman, J. (1990). Finding structure in time. Cognitive Science, 14, 179–211.
Elman, J. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7, 195–225.
Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Anchorage, Alaska.
Fisher, C. (1996). Structural limits on verb mapping: The role of analogy in children’s interpretation of sentences. Cognitive Psychology, 31, 41–81.
Fisher, C., & Tokura, H. (1996). Acoustic cues to grammatical structure in infant-directed speech: Cross-linguistic evidence. Child Development, 67, 3192–3218.
Fisher, C., Gleitman, H., & Gleitman, L. (1989). On the semantic content of subcategorization frames. Cognitive Psychology, 23, 331–392.
Fisher, C., Gertner, Y., Scott, R., & Yuan, S. (2010). Syntactic bootstrapping. Wiley Interdisciplinary Reviews: Cognitive Science, 1, 143–149.
Gao, J., & Johnson, M. (2008). A comparison of bayesian estimators for unsupervised Hidden Markov Model POS taggers. In Proceedings of EMNLP-2008 (pp. 344–352), Honolulu.
Gentner, D. (2006). Why verbs are hard to learn. In K. Hirsh-Pasek & R. Golinkoff (Eds.), Action meets word: How children learn verbs (pp. 544–564). Oxford/New York: Oxford University Press.
Gertner, Y., Fisher, C., & Eisengart, J. (2006). Learning words and rules: Abstract knowledge of word order in early sentence comprehension. Psychological Science, 17, 684–691.
Gildea, D., & Palmer, M. (2002). The necessity of parsing for predicate argument recognition. In ACL (pp. 239–246), Philadelphia.
Gillette, J., Gleitman, H., Gleitman, L. R., & Lederer, A. (1999). Human simulations of vocabulary learning. Cognition, 73, 135–176.
Goldwater, S., & Griffiths, T. (2007). A fully bayesian approach to unsupervised part-of-speech tagging. In ACL (pp. 744–751), Prague.
Gomez, R., & Gerken, L. (1999). Artificial grammar learning by 1-year-olds leads to specific and abstract knowledge. Cognition, 70, 109–135.
Haghighi, A., & Klein, D. (2006). Prototype-driven learning for sequence models. In Proceedings of HTL-NAACL, New York.
Hajič, J., Ciaramita, M., Johansson, R., Kawahara, D., Martí, M., Màrquez, L., Meyers, A., Nivre, J., Padó, S., Štěpánek, J., Straňák, P., Surdeanu, M., Xue, N., & Zhang, Y. (2009). The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, Boulder.
Harris, Z. (1951). Methods in structural linguistics. Chicago: Chicago University Press.
Hochmann, J., Endress, A. D., & Mehler, J. (2010). Word frequency as a cue for identifying function words in infancy. Cognition, 115, 444–457.
Huang, F., & Yates, A. (2009). Distributional representations for handling sparsity in supervised sequence-labeling. In ACL, Singapore.
Johnson, M. (2007). Why doesn’t EM find good HMM POS-taggers? In Proceedings of the 2007 Joint Conference of EMNLP-CoNLL (pp. 296–305), Prague.
Johnson, M., Demuth, K., Frank, M. C., & Jones, B. (2010). Synergies in learning words and their meanings. In Neural Information Processing Systems, 23, Vancouver.
Kazama, J., & Torisawa, K. (2007). A new perceptron algorithm for sequence labeling with non-local features. In Proceedings of the 2007 Joint Conference of EMNLP-CoNLL (pp. 315–324), Prague.
Kelly, M. H. (1992). Using sound to solve syntactic problems: The role of phonology in grammatical category assignments. Psychological Review, 99, 349–364.
Kingsbury, P., & Palmer, M. (2002). From Treebank to PropBank. In Proceedings of LREC-2002, Spain.
Klein, D., & Manning, C. (2004). Corpus-based induction of syntactic structure: Models of dependency and constituency. In Proceedings of the Association for Computational Linguistics (ACL), Barcelona.
Landau, B., & Gleitman, L. (1985). Language and experience. Cambridge: Harvard University Press.
Levin, B., & Rappaport-Hovav, M. (2005). Argument realization. Research surveys in linguistics series. Cambridge: Cambridge University Press.
MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk (3rd ed.). Mahwah: Lawrence Elrbaum Associates.
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Marcus., G. F., Vijayan, S., Rao, S. B., & Vishton, P. M. (1999). Rule learning by seven-month-old infants. Science, 283, 77–80.
Màrquez, L., Carreras, X., Litkowski, K., & Stevenson, S. (2008). Semantic role labeling: An introduction to the special issue. Computational Linguistics, 34, 145–159.
Meilă, M. (2002). Comparing clusterings (Tech. Rep. 418). University of Washington Statistics Department.
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. (1990). Wordnet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–312.
Mintz, T. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90, 91–117.
Mintz, T., Newport, E., & Bever, T. (2002). The distributional structure of grammatical categories in speech to young children. Cognitive Science, 26, 393–424.
Monaghan, P., Chater, N., & Christiansen, M. (2005). The differential role of phonological and distributional cues in grammatical categorisation. Cognition, 96, 143–182.
Naigles, L. R. (1990). Children use syntax to learn verb meanings. Journal of Child Language, 17, 357–374.
Nappa, R., Wessel, A., McEldoon, K., Gleitman, L., & Trueswell, J. (2009). Use of speaker’s gaze and syntax in verb learning. Language Learning and Development, 5, 203–234.
Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106.
Parisien, C., & Stevenson, S. (2010). Learning verb alternations in a usage-based bayesian model. In Proceedings of the 32nd annual meeting of the Cognitive Science Society, Portland.
Perfors, A., Tenenbaum, J., & Wonnacott, E. (2010). Variability, negative evidence, and the acquisition of verb argument constructions. Journal of Child Language, 37, 607–642.
Pinker, S. (1984). Language learnability and language development. Cambridge: Harvard University Press.
Pinker, S. (1989). Learnability and cognition. Cambridge: MIT Press.
Punyakanok, V., Roth, D., & Yih, W. (2008). The importance of syntactic parsing and inference in semantic role labeling. Computational Linguistics, 34(2), 257–287.
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–285.
Ravi, S., & Knight, K. (2009). Minimized models for unsupervised part-of-speech tagging. In Proceedings of the Joint Conferenceof the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP), Singapore.
Rispoli, M. (1989). Encounters with japanese verbs: Caregiver sentences and the categorization of transitive and intransitive action verbs. First Language, 9, 57–80.
Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition. Wiley Interdisciplinary Reviews: Cognitive Science, 1, 906–914.
Shi, R., Morgan, J. L., & Allopenna, P. (1998). Phonological and acoustic bases for earliest grammatical category assignment: A cross-linguistic perspective. Journal of Child Language, 25(01), 169–201.
Shi, R., Werker, J. F., & Morgan, J. L. (1999). Newborn infants’ sensitivity to perceptual cues to lexical and grammatical words. Cognition, 72(2), B11–B21.
Smith, L., & Yu, C. (2008). Infants rapidly learn word-referent mappings via cross-situational statistics. Cognition, 106, 1558–1568.
Snedeker, J., & Gleitman, L. (2004). Why it is hard to label our concepts. In D. G. Hall & S. R. Waxman (Eds.), Weaving a lexicon. Cambridge: MIT Press.
Solan, Z., Horn, D., Ruppin, E., & Edelman, S. (2005). Unsupervised learning of natural languages. Proceedings of the National Academy of Science, 102, 11629–11634.
Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L., & Nivre, J. (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning, Manchester.
Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge: Harvard University Press.
Toutanova, K., & Johnson, M. (2007). A bayesian LDA-based model for semi-supervised part-of-speech tagging. In Proceedings of NIPS, Vancouver.
Waterfall, H., Sandbank, B., Onnis, L., & Edelman, S. (2010). An empirical generative framework for computational modeling of language acquisition. Journal of Child Language, 37, 671–703.
Yang, C. (2011). A statistical test for grammar. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, Portland.
Yu, C., & Joachims, T. (2009). Learning structural svms with latent variables. In ICML, Montreal.
Yuan, S., Fisher, C., & Snedeker, J. (2012). Counting the nouns: Simple structural cues to verb meaning. Child Development, 83, 1382–1399.
Acknowledgements
We wish to thank Yael Gertner for insightful discussion that led up to this work as well as the various annotators who helped create the semantically tagged data. This research is supported by NSF grant BCS-0620257 and NIH grant R01-HD054448.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2013 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Connor, M., Fisher, C., Roth, D. (2013). Starting from Scratch in Semantic Role Labeling: Early Indirect Supervision. In: Villavicencio, A., Poibeau, T., Korhonen, A., Alishahi, A. (eds) Cognitive Aspects of Computational Language Acquisition. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31863-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-31863-4_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31862-7
Online ISBN: 978-3-642-31863-4
eBook Packages: Computer ScienceComputer Science (R0)