Starting from Scratch in Semantic Role Labeling: Early Indirect Supervision

Connor, Michael; Fisher, Cynthia; Roth, Dan

doi:10.1007/978-3-642-31863-4_10

Michael Connor⁵,
Cynthia Fisher⁶ &
Dan Roth⁵

Part of the book series: Theory and Applications of Natural Language Processing ((NLP))

1121 Accesses
5 Citations

Abstract

A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Where do children learning their first languages begin in solving this problem? To experiment with different representations that children may use to begin understanding language, we have built a computational model for this early point in language acquisition. This system, Latent BabySRL, learns from transcriptions of natural child-directed speech and makes use of psycholinguistically plausible background knowledge and realistically noisy semantic feedback to improve both an intermediate syntactic representation and its final semantic role classification. Using this system we show that it is possible for a simple learner in a plausible (noisy) setup to begin comprehending the meanings of simple sentences, when initialized with a small amount of concrete noun knowledge and some simple syntax-semantics mapping biases, before acquiring any specific verb knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In our corpus the full set of role labels is: A0, A1, A2, A3, A4, AM-ADV, AM-CAU, AM-DIR, AM-DIS, AM-EXT, AM-LOC, AM-MNR, AM-MOD, AM-NEG, AM-PNC, AM-PRD, AM-PRP, AM-RCL, AM-TMP.
2.
Corpus, decision files and additional annotation information available at http.://cogcomp.cs.illinois.edu/~connor2/babySRL/
3.
We used parts of the Bloom [5, 6], Brent [8], Brown [10], Clark [18], Cornell, MacWhinney [56], Post [26] and Providence [27] collections.
4.
We tuned the priors using the same set of 8 value pairs suggested by Gao and Johnson [36], using a held out set of POS-tagged CDS to evaluate final performance. Our final values are an emission prior of 0.1 and a transitions prior of 0.0001; as a Dirichlet prior approaches 0 the resulting multinomial becomes peakier with most of the probability mass concentrated in a few points.
5.
We also include a small third class for punctuation, which is discarded.
6.
TO,IN,EX,POS,WDT,PDT,WRB,MD,CC,DT,RP,UH.
7.
Note that the data shown in Fig. 2 reflect HMM initialization and training that differed slightly from that described in Sect. 3.2.1 and used in the experiments reported here: In that previous work, the set of function words differed slightly (e.g., in the current version we added ‘not’ to the function word set, and removed ‘like’ and ‘have’), fewer states were allocated to punctuation (3 rather than 5), and the HMM was trained on a smaller sample of unlabeled text (up to 160,000 sentences rather than 320,000). The revised HMM parser used in the present experiments produced very similar results.
8.
This roughly represents phonological/distribution information that might be useful for clustering verbs together (e.g., [64]), but that is not exploited by our HMM because the HMM takes transcribed words as input.
9.
Because we focus on noun arguments, we miss those predicate arguments that do not include any nouns; the maximum SRL role F1 with only noun arguments correct is 0.8255.

References

Alishahi, A., & Stevenson, S. (2010). A computational model of learning semantic roles from child-directed language. Language and Cognitive Processes, 25(1), 50–93.
Article Google Scholar
Alishahi, A., & Stevenson, S. (2012). Gradual acquisition of verb selectional prefences in a bayesian model. In A. Villavicencio, A. Alishahi, T. Poibeau, & A. Korhonen (Eds.), Cognitive aspects of computational language acquisition. New York: Springer.
Google Scholar
Beal, M. J. (2003). Variational algorithms for approximate bayesian inference. Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London.
Google Scholar
Bever, T. G. (1970). The cognitive basis for linguistic structures. In J. Hayes (Ed.), Cognition and the development of language (pp. 279–362). New York: Wiley.
Google Scholar
Bloom, B. H. (1970). Space/time trade-offs in Hash coding with allowable errors. Communications of the ACM, 13(7), 422–426.
Article MATH Google Scholar
Bloom, L. (1973). One word at a time: The use of single-word utterances before syntax. The Hague: Mouton.
Google Scholar
Bod, R. (2009). From exemplar to grammar: A probabilistic analogy-based model of language learning. Cognitive Science, 33(5), 752–793.
Article Google Scholar
Brent, M. R., & Siskind, J. M. (2001). The role of exposure to isolated words in early vocabulary development. Cognition, 81, 31–44.
Article Google Scholar
Brill, E. (1997). Unsupervised learning of disambiguation rules for part of speech tagging. In S. Armstrong, K. Church, P. Isabelle, S. Manzi, E. Tzoukermann, & D. Yarowsky (Eds.), Natural language processing using very large corpora. Dordrecht: Kluwer Academic Press.
Google Scholar
Brown, R. (1973). A first language. Cambridge: Harvard University Press.
Google Scholar
Brown, P., Pietra, V. D., deSouza, P., Lai, J., & Mercer, R. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18(4), 467–479.
Google Scholar
Carreras, X., & Màrquez, L. (2004). Introduction to the CoNLL-2004 shared tasks: Semantic role labeling. In Proceedings of CoNLL-2004 (pp. 89–97), Boston.
Google Scholar
Carreras, X., & Màrquez, L. (2005). Introduction to the CoNLL-2005 shared task: Semantic role labeling. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), Ann Arbor.
Google Scholar
Chang, F., Dell, G. S., & Bock, K. (2006). Becoming syntactic. Psychological review, 113(2), 234–272.
Article Google Scholar
Chang, M., Goldwasser, D., Roth, D., & Srikumar, V. (2010). Discriminative learning over constrained latent representations. In Proceedings of the Annual Meeting of the North American Association of Computational Linguistics (NAACL), Los Angeles.
Google Scholar
Charniak, E. (1997). Statistical parsing with a context-free grammar and word statistics. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Providence.
Google Scholar
Cherry, C., & Quirk, C. (2008). Discriminative, syntactic language modeling through latent svms. In Proceedings of the Eighth Conference of AMTA, Honolulu.
Google Scholar
Clark, E. V. (1978). Awwareness of language: Some evidence from what children say and do. In R. J. A. Sinclair & W. Levelt (Eds.), The child’s conception of language. Berlin: Springer.
Google Scholar
Clark, E. V. (1990). Speaker perspective in language acquisition. Linguistics, 28, 1201–1220.
Article Google Scholar
Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In EMNLP, Philadelphia.
Google Scholar
Connor, M., Gertner, Y., Fisher, C., & Roth, D. (2008). Baby srl: Modeling early language acquisition. In Proceedings of the Annual Conference on Computational Natural Language Learning (CoNLL), Manchester.
Google Scholar
Connor, M., Gertner, Y., Fisher, C., & Roth, D. (2009). Minimally supervised model of early language acquisition. In Proceedings of the Annual Conference on Computational Natural Language Learning (CoNLL), Boulder.
Google Scholar
Connor, M., Gertner, Y., Fisher, C., & Roth, D. (2010). Starting from scratch in semantic role labeling. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL), Uppsala.
Google Scholar
Connor, M., Fisher, C., & Roth, D. (2011). Online latent structure training for language acquisition. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Barcelona.
Google Scholar
Dale, P. S., & Fenson, L. (1996). Lexical development norms for young children. Behavior Research Methods, Instruments, & Computers, 28, 125–127.
Article Google Scholar
Demetras, M., Post, K., & Snow, C. (1986). Feedback to first-language learners. Journal of Child Language, 13, 275–292.
Article Google Scholar
Demuth, K., Culbertson, J., & Alter, J. (2006). Word-minimality, epenthesis, and coda licensing in the acquisition of english. Language & Speech, 49, 137–174.
Article Google Scholar
Dowty, D. (1991). Thematic proto-roles and argument selection. Language, 67, 547–619.
Google Scholar
Elman, J. (1990). Finding structure in time. Cognitive Science, 14, 179–211.
Article Google Scholar
Elman, J. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7, 195–225.
Google Scholar
Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Anchorage, Alaska.
Google Scholar
Fisher, C. (1996). Structural limits on verb mapping: The role of analogy in children’s interpretation of sentences. Cognitive Psychology, 31, 41–81.
Article Google Scholar
Fisher, C., & Tokura, H. (1996). Acoustic cues to grammatical structure in infant-directed speech: Cross-linguistic evidence. Child Development, 67, 3192–3218.
Article Google Scholar
Fisher, C., Gleitman, H., & Gleitman, L. (1989). On the semantic content of subcategorization frames. Cognitive Psychology, 23, 331–392.
Article Google Scholar
Fisher, C., Gertner, Y., Scott, R., & Yuan, S. (2010). Syntactic bootstrapping. Wiley Interdisciplinary Reviews: Cognitive Science, 1, 143–149.
Google Scholar
Gao, J., & Johnson, M. (2008). A comparison of bayesian estimators for unsupervised Hidden Markov Model POS taggers. In Proceedings of EMNLP-2008 (pp. 344–352), Honolulu.
Google Scholar
Gentner, D. (2006). Why verbs are hard to learn. In K. Hirsh-Pasek & R. Golinkoff (Eds.), Action meets word: How children learn verbs (pp. 544–564). Oxford/New York: Oxford University Press.
Chapter Google Scholar
Gertner, Y., Fisher, C., & Eisengart, J. (2006). Learning words and rules: Abstract knowledge of word order in early sentence comprehension. Psychological Science, 17, 684–691.
Article Google Scholar
Gildea, D., & Palmer, M. (2002). The necessity of parsing for predicate argument recognition. In ACL (pp. 239–246), Philadelphia.
Google Scholar
Gillette, J., Gleitman, H., Gleitman, L. R., & Lederer, A. (1999). Human simulations of vocabulary learning. Cognition, 73, 135–176.
Article Google Scholar
Goldwater, S., & Griffiths, T. (2007). A fully bayesian approach to unsupervised part-of-speech tagging. In ACL (pp. 744–751), Prague.
Google Scholar
Gomez, R., & Gerken, L. (1999). Artificial grammar learning by 1-year-olds leads to specific and abstract knowledge. Cognition, 70, 109–135.
Article Google Scholar
Haghighi, A., & Klein, D. (2006). Prototype-driven learning for sequence models. In Proceedings of HTL-NAACL, New York.
Google Scholar
Hajič, J., Ciaramita, M., Johansson, R., Kawahara, D., Martí, M., Màrquez, L., Meyers, A., Nivre, J., Padó, S., Štěpánek, J., Straňák, P., Surdeanu, M., Xue, N., & Zhang, Y. (2009). The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, Boulder.
Google Scholar
Harris, Z. (1951). Methods in structural linguistics. Chicago: Chicago University Press.
Google Scholar
Hochmann, J., Endress, A. D., & Mehler, J. (2010). Word frequency as a cue for identifying function words in infancy. Cognition, 115, 444–457.
Article Google Scholar
Huang, F., & Yates, A. (2009). Distributional representations for handling sparsity in supervised sequence-labeling. In ACL, Singapore.
Google Scholar
Johnson, M. (2007). Why doesn’t EM find good HMM POS-taggers? In Proceedings of the 2007 Joint Conference of EMNLP-CoNLL (pp. 296–305), Prague.
Google Scholar
Johnson, M., Demuth, K., Frank, M. C., & Jones, B. (2010). Synergies in learning words and their meanings. In Neural Information Processing Systems, 23, Vancouver.
Google Scholar
Kazama, J., & Torisawa, K. (2007). A new perceptron algorithm for sequence labeling with non-local features. In Proceedings of the 2007 Joint Conference of EMNLP-CoNLL (pp. 315–324), Prague.
Google Scholar
Kelly, M. H. (1992). Using sound to solve syntactic problems: The role of phonology in grammatical category assignments. Psychological Review, 99, 349–364.
Article Google Scholar
Kingsbury, P., & Palmer, M. (2002). From Treebank to PropBank. In Proceedings of LREC-2002, Spain.
Google Scholar
Klein, D., & Manning, C. (2004). Corpus-based induction of syntactic structure: Models of dependency and constituency. In Proceedings of the Association for Computational Linguistics (ACL), Barcelona.
Google Scholar
Landau, B., & Gleitman, L. (1985). Language and experience. Cambridge: Harvard University Press.
Google Scholar
Levin, B., & Rappaport-Hovav, M. (2005). Argument realization. Research surveys in linguistics series. Cambridge: Cambridge University Press.
Google Scholar
MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk (3rd ed.). Mahwah: Lawrence Elrbaum Associates.
Google Scholar
Marcus, M. P., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.
Google Scholar
Marcus., G. F., Vijayan, S., Rao, S. B., & Vishton, P. M. (1999). Rule learning by seven-month-old infants. Science, 283, 77–80.
Google Scholar
Màrquez, L., Carreras, X., Litkowski, K., & Stevenson, S. (2008). Semantic role labeling: An introduction to the special issue. Computational Linguistics, 34, 145–159.
Article Google Scholar
Meilă, M. (2002). Comparing clusterings (Tech. Rep. 418). University of Washington Statistics Department.
Google Scholar
Miller, G., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. (1990). Wordnet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–312.
Article Google Scholar
Mintz, T. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90, 91–117.
Article Google Scholar
Mintz, T., Newport, E., & Bever, T. (2002). The distributional structure of grammatical categories in speech to young children. Cognitive Science, 26, 393–424.
Article Google Scholar
Monaghan, P., Chater, N., & Christiansen, M. (2005). The differential role of phonological and distributional cues in grammatical categorisation. Cognition, 96, 143–182.
Article Google Scholar
Naigles, L. R. (1990). Children use syntax to learn verb meanings. Journal of Child Language, 17, 357–374.
Article Google Scholar
Nappa, R., Wessel, A., McEldoon, K., Gleitman, L., & Trueswell, J. (2009). Use of speaker’s gaze and syntax in verb learning. Language Learning and Development, 5, 203–234.
Article Google Scholar
Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106.
Article Google Scholar
Parisien, C., & Stevenson, S. (2010). Learning verb alternations in a usage-based bayesian model. In Proceedings of the 32nd annual meeting of the Cognitive Science Society, Portland.
Google Scholar
Perfors, A., Tenenbaum, J., & Wonnacott, E. (2010). Variability, negative evidence, and the acquisition of verb argument constructions. Journal of Child Language, 37, 607–642.
Article Google Scholar
Pinker, S. (1984). Language learnability and language development. Cambridge: Harvard University Press.
Google Scholar
Pinker, S. (1989). Learnability and cognition. Cambridge: MIT Press.
Google Scholar
Punyakanok, V., Roth, D., & Yih, W. (2008). The importance of syntactic parsing and inference in semantic role labeling. Computational Linguistics, 34(2), 257–287.
Article Google Scholar
Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–285.
Article Google Scholar
Ravi, S., & Knight, K. (2009). Minimized models for unsupervised part-of-speech tagging. In Proceedings of the Joint Conferenceof the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP), Singapore.
Google Scholar
Rispoli, M. (1989). Encounters with japanese verbs: Caregiver sentences and the categorization of transitive and intransitive action verbs. First Language, 9, 57–80.
Article Google Scholar
Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition. Wiley Interdisciplinary Reviews: Cognitive Science, 1, 906–914.
Article Google Scholar
Shi, R., Morgan, J. L., & Allopenna, P. (1998). Phonological and acoustic bases for earliest grammatical category assignment: A cross-linguistic perspective. Journal of Child Language, 25(01), 169–201.
Article Google Scholar
Shi, R., Werker, J. F., & Morgan, J. L. (1999). Newborn infants’ sensitivity to perceptual cues to lexical and grammatical words. Cognition, 72(2), B11–B21.
Article Google Scholar
Smith, L., & Yu, C. (2008). Infants rapidly learn word-referent mappings via cross-situational statistics. Cognition, 106, 1558–1568.
Article Google Scholar
Snedeker, J., & Gleitman, L. (2004). Why it is hard to label our concepts. In D. G. Hall & S. R. Waxman (Eds.), Weaving a lexicon. Cambridge: MIT Press.
Google Scholar
Solan, Z., Horn, D., Ruppin, E., & Edelman, S. (2005). Unsupervised learning of natural languages. Proceedings of the National Academy of Science, 102, 11629–11634.
Article Google Scholar
Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L., & Nivre, J. (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning, Manchester.
Google Scholar
Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge: Harvard University Press.
Google Scholar
Toutanova, K., & Johnson, M. (2007). A bayesian LDA-based model for semi-supervised part-of-speech tagging. In Proceedings of NIPS, Vancouver.
Google Scholar
Waterfall, H., Sandbank, B., Onnis, L., & Edelman, S. (2010). An empirical generative framework for computational modeling of language acquisition. Journal of Child Language, 37, 671–703.
Article Google Scholar
Yang, C. (2011). A statistical test for grammar. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, Portland.
Google Scholar
Yu, C., & Joachims, T. (2009). Learning structural svms with latent variables. In ICML, Montreal.
Google Scholar
Yuan, S., Fisher, C., & Snedeker, J. (2012). Counting the nouns: Simple structural cues to verb meaning. Child Development, 83, 1382–1399.
Article Google Scholar

Download references

Acknowledgements

We wish to thank Yael Gertner for insightful discussion that led up to this work as well as the various annotators who helped create the semantically tagged data. This research is supported by NSF grant BCS-0620257 and NIH grant R01-HD054448.

Author information

Authors and Affiliations

Department of Computer Science, University of Illinois, Urbana, USA
Michael Connor & Dan Roth
Department of Psychology, University of Illinois, Champaign, USA
Cynthia Fisher

Authors

Michael Connor
View author publications
You can also search for this author in PubMed Google Scholar
Cynthia Fisher
View author publications
You can also search for this author in PubMed Google Scholar
Dan Roth
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael Connor .

Editor information

Editors and Affiliations

Institute of Informatics, Federal University of Rio Grande do Sul, Av. Bento Gonçalves, Porto Alegre, 9500, Brazil
Aline Villavicencio
Universite Sorbonne Nouvelle, LATTICE-CNRS, Ecole Normale Superieure and, rue d'Ulm 45, Paris, 75005, France
Thierry Poibeau
Computer Laboratory, William Gates Building, University of Cambridge, Thomson Avenue 15 JJ, Cambridge, CB3 0FD, United Kingdom
Anna Korhonen
and Communication (TiCC), Tilburg University, Tilburg center for Cognition, Warandelaan 2, Tilburg, 5037, Netherlands
Afra Alishahi

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Connor, M., Fisher, C., Roth, D. (2013). Starting from Scratch in Semantic Role Labeling: Early Indirect Supervision. In: Villavicencio, A., Poibeau, T., Korhonen, A., Alishahi, A. (eds) Cognitive Aspects of Computational Language Acquisition. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31863-4_10

Download citation

DOI: https://doi.org/10.1007/978-3-642-31863-4_10
Published: 27 September 2012
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31862-7
Online ISBN: 978-3-642-31863-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics