Skip to main content

Starting from Scratch in Semantic Role Labeling: Early Indirect Supervision

  • Chapter
  • First Online:
Cognitive Aspects of Computational Language Acquisition

Abstract

A fundamental step in sentence comprehension involves assigning semantic roles to sentence constituents. To accomplish this, the listener must parse the sentence, find constituents that are candidate arguments, and assign semantic roles to those constituents. Where do children learning their first languages begin in solving this problem? To experiment with different representations that children may use to begin understanding language, we have built a computational model for this early point in language acquisition. This system, Latent BabySRL, learns from transcriptions of natural child-directed speech and makes use of psycholinguistically plausible background knowledge and realistically noisy semantic feedback to improve both an intermediate syntactic representation and its final semantic role classification. Using this system we show that it is possible for a simple learner in a plausible (noisy) setup to begin comprehending the meanings of simple sentences, when initialized with a small amount of concrete noun knowledge and some simple syntax-semantics mapping biases, before acquiring any specific verb knowledge.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    In our corpus the full set of role labels is: A0, A1, A2, A3, A4, AM-ADV, AM-CAU, AM-DIR, AM-DIS, AM-EXT, AM-LOC, AM-MNR, AM-MOD, AM-NEG, AM-PNC, AM-PRD, AM-PRP, AM-RCL, AM-TMP.

  2. 2.

    Corpus, decision files and additional annotation information available at http.://cogcomp.cs.illinois.edu/~connor2/babySRL/

  3. 3.

    We used parts of the Bloom [5, 6], Brent [8], Brown [10], Clark [18], Cornell, MacWhinney [56], Post [26] and Providence [27] collections.

  4. 4.

    We tuned the priors using the same set of 8 value pairs suggested by Gao and Johnson [36], using a held out set of POS-tagged CDS to evaluate final performance. Our final values are an emission prior of 0.1 and a transitions prior of 0.0001; as a Dirichlet prior approaches 0 the resulting multinomial becomes peakier with most of the probability mass concentrated in a few points.

  5. 5.

    We also include a small third class for punctuation, which is discarded.

  6. 6.

    TO,IN,EX,POS,WDT,PDT,WRB,MD,CC,DT,RP,UH.

  7. 7.

    Note that the data shown in Fig. 2 reflect HMM initialization and training that differed slightly from that described in Sect. 3.2.1 and used in the experiments reported here: In that previous work, the set of function words differed slightly (e.g., in the current version we added ‘not’ to the function word set, and removed ‘like’ and ‘have’), fewer states were allocated to punctuation (3 rather than 5), and the HMM was trained on a smaller sample of unlabeled text (up to 160,000 sentences rather than 320,000). The revised HMM parser used in the present experiments produced very similar results.

  8. 8.

    This roughly represents phonological/distribution information that might be useful for clustering verbs together (e.g., [64]), but that is not exploited by our HMM because the HMM takes transcribed words as input.

  9. 9.

    Because we focus on noun arguments, we miss those predicate arguments that do not include any nouns; the maximum SRL role F1 with only noun arguments correct is 0.8255.

References

  1. Alishahi, A., & Stevenson, S. (2010). A computational model of learning semantic roles from child-directed language. Language and Cognitive Processes, 25(1), 50–93.

    Article  Google Scholar 

  2. Alishahi, A., & Stevenson, S. (2012). Gradual acquisition of verb selectional prefences in a bayesian model. In A. Villavicencio, A. Alishahi, T. Poibeau, & A. Korhonen (Eds.), Cognitive aspects of computational language acquisition. New York: Springer.

    Google Scholar 

  3. Beal, M. J. (2003). Variational algorithms for approximate bayesian inference. Ph.D. thesis, Gatsby Computational Neuroscience Unit, University College London.

    Google Scholar 

  4. Bever, T. G. (1970). The cognitive basis for linguistic structures. In J. Hayes (Ed.), Cognition and the development of language (pp. 279–362). New York: Wiley.

    Google Scholar 

  5. Bloom, B. H. (1970). Space/time trade-offs in Hash coding with allowable errors. Communications of the ACM, 13(7), 422–426.

    Article  MATH  Google Scholar 

  6. Bloom, L. (1973). One word at a time: The use of single-word utterances before syntax. The Hague: Mouton.

    Google Scholar 

  7. Bod, R. (2009). From exemplar to grammar: A probabilistic analogy-based model of language learning. Cognitive Science, 33(5), 752–793.

    Article  Google Scholar 

  8. Brent, M. R., & Siskind, J. M. (2001). The role of exposure to isolated words in early vocabulary development. Cognition, 81, 31–44.

    Article  Google Scholar 

  9. Brill, E. (1997). Unsupervised learning of disambiguation rules for part of speech tagging. In S. Armstrong, K. Church, P. Isabelle, S. Manzi, E. Tzoukermann, & D. Yarowsky (Eds.), Natural language processing using very large corpora. Dordrecht: Kluwer Academic Press.

    Google Scholar 

  10. Brown, R. (1973). A first language. Cambridge: Harvard University Press.

    Google Scholar 

  11. Brown, P., Pietra, V. D., deSouza, P., Lai, J., & Mercer, R. (1992). Class-based n-gram models of natural language. Computational Linguistics, 18(4), 467–479.

    Google Scholar 

  12. Carreras, X., & Màrquez, L. (2004). Introduction to the CoNLL-2004 shared tasks: Semantic role labeling. In Proceedings of CoNLL-2004 (pp. 89–97), Boston.

    Google Scholar 

  13. Carreras, X., & Màrquez, L. (2005). Introduction to the CoNLL-2005 shared task: Semantic role labeling. In Proceedings of the Ninth Conference on Computational Natural Language Learning (CoNLL-2005), Ann Arbor.

    Google Scholar 

  14. Chang, F., Dell, G. S., & Bock, K. (2006). Becoming syntactic. Psychological review, 113(2), 234–272.

    Article  Google Scholar 

  15. Chang, M., Goldwasser, D., Roth, D., & Srikumar, V. (2010). Discriminative learning over constrained latent representations. In Proceedings of the Annual Meeting of the North American Association of Computational Linguistics (NAACL), Los Angeles.

    Google Scholar 

  16. Charniak, E. (1997). Statistical parsing with a context-free grammar and word statistics. In Proceedings of the National Conference on Artificial Intelligence (AAAI), Providence.

    Google Scholar 

  17. Cherry, C., & Quirk, C. (2008). Discriminative, syntactic language modeling through latent svms. In Proceedings of the Eighth Conference of AMTA, Honolulu.

    Google Scholar 

  18. Clark, E. V. (1978). Awwareness of language: Some evidence from what children say and do. In R. J. A. Sinclair & W. Levelt (Eds.), The child’s conception of language. Berlin: Springer.

    Google Scholar 

  19. Clark, E. V. (1990). Speaker perspective in language acquisition. Linguistics, 28, 1201–1220.

    Article  Google Scholar 

  20. Collins, M. (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms. In EMNLP, Philadelphia.

    Google Scholar 

  21. Connor, M., Gertner, Y., Fisher, C., & Roth, D. (2008). Baby srl: Modeling early language acquisition. In Proceedings of the Annual Conference on Computational Natural Language Learning (CoNLL), Manchester.

    Google Scholar 

  22. Connor, M., Gertner, Y., Fisher, C., & Roth, D. (2009). Minimally supervised model of early language acquisition. In Proceedings of the Annual Conference on Computational Natural Language Learning (CoNLL), Boulder.

    Google Scholar 

  23. Connor, M., Gertner, Y., Fisher, C., & Roth, D. (2010). Starting from scratch in semantic role labeling. In Proceedings of the Annual Meeting of the Association of Computational Linguistics (ACL), Uppsala.

    Google Scholar 

  24. Connor, M., Fisher, C., & Roth, D. (2011). Online latent structure training for language acquisition. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI), Barcelona.

    Google Scholar 

  25. Dale, P. S., & Fenson, L. (1996). Lexical development norms for young children. Behavior Research Methods, Instruments, & Computers, 28, 125–127.

    Article  Google Scholar 

  26. Demetras, M., Post, K., & Snow, C. (1986). Feedback to first-language learners. Journal of Child Language, 13, 275–292.

    Article  Google Scholar 

  27. Demuth, K., Culbertson, J., & Alter, J. (2006). Word-minimality, epenthesis, and coda licensing in the acquisition of english. Language & Speech, 49, 137–174.

    Article  Google Scholar 

  28. Dowty, D. (1991). Thematic proto-roles and argument selection. Language, 67, 547–619.

    Google Scholar 

  29. Elman, J. (1990). Finding structure in time. Cognitive Science, 14, 179–211.

    Article  Google Scholar 

  30. Elman, J. (1991). Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7, 195–225.

    Google Scholar 

  31. Felzenszwalb, P., McAllester, D., & Ramanan, D. (2008). A discriminatively trained, multiscale, deformable part model. In IEEE International Conference on Computer Vision and Pattern Recognition (CVPR) Anchorage, Alaska.

    Google Scholar 

  32. Fisher, C. (1996). Structural limits on verb mapping: The role of analogy in children’s interpretation of sentences. Cognitive Psychology, 31, 41–81.

    Article  Google Scholar 

  33. Fisher, C., & Tokura, H. (1996). Acoustic cues to grammatical structure in infant-directed speech: Cross-linguistic evidence. Child Development, 67, 3192–3218.

    Article  Google Scholar 

  34. Fisher, C., Gleitman, H., & Gleitman, L. (1989). On the semantic content of subcategorization frames. Cognitive Psychology, 23, 331–392.

    Article  Google Scholar 

  35. Fisher, C., Gertner, Y., Scott, R., & Yuan, S. (2010). Syntactic bootstrapping. Wiley Interdisciplinary Reviews: Cognitive Science, 1, 143–149.

    Google Scholar 

  36. Gao, J., & Johnson, M. (2008). A comparison of bayesian estimators for unsupervised Hidden Markov Model POS taggers. In Proceedings of EMNLP-2008 (pp. 344–352), Honolulu.

    Google Scholar 

  37. Gentner, D. (2006). Why verbs are hard to learn. In K. Hirsh-Pasek & R. Golinkoff (Eds.), Action meets word: How children learn verbs (pp. 544–564). Oxford/New York: Oxford University Press.

    Chapter  Google Scholar 

  38. Gertner, Y., Fisher, C., & Eisengart, J. (2006). Learning words and rules: Abstract knowledge of word order in early sentence comprehension. Psychological Science, 17, 684–691.

    Article  Google Scholar 

  39. Gildea, D., & Palmer, M. (2002). The necessity of parsing for predicate argument recognition. In ACL (pp. 239–246), Philadelphia.

    Google Scholar 

  40. Gillette, J., Gleitman, H., Gleitman, L. R., & Lederer, A. (1999). Human simulations of vocabulary learning. Cognition, 73, 135–176.

    Article  Google Scholar 

  41. Goldwater, S., & Griffiths, T. (2007). A fully bayesian approach to unsupervised part-of-speech tagging. In ACL (pp. 744–751), Prague.

    Google Scholar 

  42. Gomez, R., & Gerken, L. (1999). Artificial grammar learning by 1-year-olds leads to specific and abstract knowledge. Cognition, 70, 109–135.

    Article  Google Scholar 

  43. Haghighi, A., & Klein, D. (2006). Prototype-driven learning for sequence models. In Proceedings of HTL-NAACL, New York.

    Google Scholar 

  44. Hajič, J., Ciaramita, M., Johansson, R., Kawahara, D., Martí, M., Màrquez, L., Meyers, A., Nivre, J., Padó, S., Štěpánek, J., Straňák, P., Surdeanu, M., Xue, N., & Zhang, Y. (2009). The CoNLL-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, Boulder.

    Google Scholar 

  45. Harris, Z. (1951). Methods in structural linguistics. Chicago: Chicago University Press.

    Google Scholar 

  46. Hochmann, J., Endress, A. D., & Mehler, J. (2010). Word frequency as a cue for identifying function words in infancy. Cognition, 115, 444–457.

    Article  Google Scholar 

  47. Huang, F., & Yates, A. (2009). Distributional representations for handling sparsity in supervised sequence-labeling. In ACL, Singapore.

    Google Scholar 

  48. Johnson, M. (2007). Why doesn’t EM find good HMM POS-taggers? In Proceedings of the 2007 Joint Conference of EMNLP-CoNLL (pp. 296–305), Prague.

    Google Scholar 

  49. Johnson, M., Demuth, K., Frank, M. C., & Jones, B. (2010). Synergies in learning words and their meanings. In Neural Information Processing Systems, 23, Vancouver.

    Google Scholar 

  50. Kazama, J., & Torisawa, K. (2007). A new perceptron algorithm for sequence labeling with non-local features. In Proceedings of the 2007 Joint Conference of EMNLP-CoNLL (pp. 315–324), Prague.

    Google Scholar 

  51. Kelly, M. H. (1992). Using sound to solve syntactic problems: The role of phonology in grammatical category assignments. Psychological Review, 99, 349–364.

    Article  Google Scholar 

  52. Kingsbury, P., & Palmer, M. (2002). From Treebank to PropBank. In Proceedings of LREC-2002, Spain.

    Google Scholar 

  53. Klein, D., & Manning, C. (2004). Corpus-based induction of syntactic structure: Models of dependency and constituency. In Proceedings of the Association for Computational Linguistics (ACL), Barcelona.

    Google Scholar 

  54. Landau, B., & Gleitman, L. (1985). Language and experience. Cambridge: Harvard University Press.

    Google Scholar 

  55. Levin, B., & Rappaport-Hovav, M. (2005). Argument realization. Research surveys in linguistics series. Cambridge: Cambridge University Press.

    Google Scholar 

  56. MacWhinney, B. (2000). The CHILDES project: Tools for analyzing talk (3rd ed.). Mahwah: Lawrence Elrbaum Associates.

    Google Scholar 

  57. Marcus, M. P., Santorini, B., & Marcinkiewicz, M. (1993). Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2), 313–330.

    Google Scholar 

  58. Marcus., G. F., Vijayan, S., Rao, S. B., & Vishton, P. M. (1999). Rule learning by seven-month-old infants. Science, 283, 77–80.

    Google Scholar 

  59. Màrquez, L., Carreras, X., Litkowski, K., & Stevenson, S. (2008). Semantic role labeling: An introduction to the special issue. Computational Linguistics, 34, 145–159.

    Article  Google Scholar 

  60. Meilă, M. (2002). Comparing clusterings (Tech. Rep. 418). University of Washington Statistics Department.

    Google Scholar 

  61. Miller, G., Beckwith, R., Fellbaum, C., Gross, D., & Miller, K. (1990). Wordnet: An on-line lexical database. International Journal of Lexicography, 3(4), 235–312.

    Article  Google Scholar 

  62. Mintz, T. (2003). Frequent frames as a cue for grammatical categories in child directed speech. Cognition, 90, 91–117.

    Article  Google Scholar 

  63. Mintz, T., Newport, E., & Bever, T. (2002). The distributional structure of grammatical categories in speech to young children. Cognitive Science, 26, 393–424.

    Article  Google Scholar 

  64. Monaghan, P., Chater, N., & Christiansen, M. (2005). The differential role of phonological and distributional cues in grammatical categorisation. Cognition, 96, 143–182.

    Article  Google Scholar 

  65. Naigles, L. R. (1990). Children use syntax to learn verb meanings. Journal of Child Language, 17, 357–374.

    Article  Google Scholar 

  66. Nappa, R., Wessel, A., McEldoon, K., Gleitman, L., & Trueswell, J. (2009). Use of speaker’s gaze and syntax in verb learning. Language Learning and Development, 5, 203–234.

    Article  Google Scholar 

  67. Palmer, M., Gildea, D., & Kingsbury, P. (2005). The proposition bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1), 71–106.

    Article  Google Scholar 

  68. Parisien, C., & Stevenson, S. (2010). Learning verb alternations in a usage-based bayesian model. In Proceedings of the 32nd annual meeting of the Cognitive Science Society, Portland.

    Google Scholar 

  69. Perfors, A., Tenenbaum, J., & Wonnacott, E. (2010). Variability, negative evidence, and the acquisition of verb argument constructions. Journal of Child Language, 37, 607–642.

    Article  Google Scholar 

  70. Pinker, S. (1984). Language learnability and language development. Cambridge: Harvard University Press.

    Google Scholar 

  71. Pinker, S. (1989). Learnability and cognition. Cambridge: MIT Press.

    Google Scholar 

  72. Punyakanok, V., Roth, D., & Yih, W. (2008). The importance of syntactic parsing and inference in semantic role labeling. Computational Linguistics, 34(2), 257–287.

    Article  Google Scholar 

  73. Rabiner, L. R. (1989). A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2), 257–285.

    Article  Google Scholar 

  74. Ravi, S., & Knight, K. (2009). Minimized models for unsupervised part-of-speech tagging. In Proceedings of the Joint Conferenceof the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing (ACL-IJCNLP), Singapore.

    Google Scholar 

  75. Rispoli, M. (1989). Encounters with japanese verbs: Caregiver sentences and the categorization of transitive and intransitive action verbs. First Language, 9, 57–80.

    Article  Google Scholar 

  76. Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition. Wiley Interdisciplinary Reviews: Cognitive Science, 1, 906–914.

    Article  Google Scholar 

  77. Shi, R., Morgan, J. L., & Allopenna, P. (1998). Phonological and acoustic bases for earliest grammatical category assignment: A cross-linguistic perspective. Journal of Child Language, 25(01), 169–201.

    Article  Google Scholar 

  78. Shi, R., Werker, J. F., & Morgan, J. L. (1999). Newborn infants’ sensitivity to perceptual cues to lexical and grammatical words. Cognition, 72(2), B11–B21.

    Article  Google Scholar 

  79. Smith, L., & Yu, C. (2008). Infants rapidly learn word-referent mappings via cross-situational statistics. Cognition, 106, 1558–1568.

    Article  Google Scholar 

  80. Snedeker, J., & Gleitman, L. (2004). Why it is hard to label our concepts. In D. G. Hall & S. R. Waxman (Eds.), Weaving a lexicon. Cambridge: MIT Press.

    Google Scholar 

  81. Solan, Z., Horn, D., Ruppin, E., & Edelman, S. (2005). Unsupervised learning of natural languages. Proceedings of the National Academy of Science, 102, 11629–11634.

    Article  Google Scholar 

  82. Surdeanu, M., Johansson, R., Meyers, A., Màrquez, L., & Nivre, J. (2008). The CoNLL-2008 shared task on joint parsing of syntactic and semantic dependencies. In CoNLL 2008: Proceedings of the Twelfth Conference on Computational Natural Language Learning, Manchester.

    Google Scholar 

  83. Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge: Harvard University Press.

    Google Scholar 

  84. Toutanova, K., & Johnson, M. (2007). A bayesian LDA-based model for semi-supervised part-of-speech tagging. In Proceedings of NIPS, Vancouver.

    Google Scholar 

  85. Waterfall, H., Sandbank, B., Onnis, L., & Edelman, S. (2010). An empirical generative framework for computational modeling of language acquisition. Journal of Child Language, 37, 671–703.

    Article  Google Scholar 

  86. Yang, C. (2011). A statistical test for grammar. In Proceedings of the 2nd Workshop on Cognitive Modeling and Computational Linguistics, Portland.

    Google Scholar 

  87. Yu, C., & Joachims, T. (2009). Learning structural svms with latent variables. In ICML, Montreal.

    Google Scholar 

  88. Yuan, S., Fisher, C., & Snedeker, J. (2012). Counting the nouns: Simple structural cues to verb meaning. Child Development, 83, 1382–1399.

    Article  Google Scholar 

Download references

Acknowledgements

We wish to thank Yael Gertner for insightful discussion that led up to this work as well as the various annotators who helped create the semantically tagged data. This research is supported by NSF grant BCS-0620257 and NIH grant R01-HD054448.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Connor .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Connor, M., Fisher, C., Roth, D. (2013). Starting from Scratch in Semantic Role Labeling: Early Indirect Supervision. In: Villavicencio, A., Poibeau, T., Korhonen, A., Alishahi, A. (eds) Cognitive Aspects of Computational Language Acquisition. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31863-4_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31863-4_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31862-7

  • Online ISBN: 978-3-642-31863-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics