Abstract
The infinitival marker to is optional in many instances of the do-be construction, exemplified by sentences like All I want to do is (to) go to work However, it has not previously been investigated what factors govern speakers’ choices in to use and omission. Here, we analyze nearly 10,000 such examples from the Corpus of Contemporary American English (COCA), using mixed-effects logistic regression to determine the respective contributions of a range of factors including phrasal complexity, wordform frequency and predictability, and prosody in predicting to use. We found that to use rate increases as phrasal complexity increases and as wordform frequency and predictability decrease, consistent with established psycholinguistic theory and data on the use of other optional function words. We also find the first quantitative corpus-based evidence for a role of prosody in governing optional function-word use: to is used more frequently when both the immediately preceding and the immediately following syllables carry some stress. This suggests that speakers use the intervening unstressed to to prevent stress clash. This result holds in writing as well as in speech, lending support to Janet Fodor’s proposal that implicit prosody plays a role in sentence processing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Except where otherwise noted, examples in this chapter are from the Corpus of Contemporary American English (http://corpus.byu.edu/coca/), or COCA for short. We have truncated many of the examples, keeping only what is needed to make our point. Hence, most of our examples are presented without initial capitalization or sentence-final punctuation. Invented examples begin with capital letters and end with periods.
- 2.
Flickinger and Wasow claim that if the form of do is a present participle (that is, doing), then the PCV also has to be a present participle, citing invented examples like the following, which they judge unacceptable:
(i) The thing I’m doing is (to) try to learn from my mistakes. But the corpus studies we report here turned up enough real examples similar to (i) to convince us that Flickinger and Wasow were mistaken.
- 3.
Examples b, d, and f had to in the original.
- 4.
The data in our statistical model were collected in the summer of 2012, when the corpus was somewhat smaller (425Â million words) and did not yet have data from 2012.
- 5.
COCA has two distinct tags verb.BASE and verb.INF for uninflected nonfinite verbs. We have not been able to discern a consistent basis for this distinction, although verb.INF seems to appear after to at a considerably higher rate than verb.BASE. In all of our searches, we used the disjunction of these two tags to search for what we call base forms of verbs. For the purposes of this chapter, we treated the two COCA tags as interchangeable. That is, when we say a verb’s form is base, we mean it is uninflected and not preceded by to; and when we say a verb is infinitival, we mean it is preceded by to.
- 6.
The limitation of at most two intervening words was required for computational reasons.
- 7.
We used frequencies of these forms in our sample, rather than in the whole of COCA.
- 8.
To test whether people employ this UID strategy in actual usage using corpus studies has required computing information at critical points in utterances on the basis of very local information, usually immediately preceding n-grams for some very small n.
- 9.
Interestingly, all of the cases discussed in van Draat’s chapter, except the complement of help now strike us as categorically either requiring or prohibiting to.
- 10.
No sound files are available for this corpus, so our assignments of stress in these examples are based on our own intuitions.
- 11.
We say nearly because in the infrequent cases when material such as adverbs intervene between the copula and the PCV, stress-clash and segmental phonology predictors are determined by that material, not by the PCV.
- 12.
Each major predictor statistically significant in Table 1 is also significant by a likelihood-ratio test in which the null hypothesis includes a random by-PCV slope for the predictor (results not shown).
- 13.
To perhaps give a better sense of effect sizes seen in our regression model, a difference of one unit on the logit scale is equivalent to the difference between to use probabilities of, for example, 0.02 and 0.05, between 0.05 and 0.12, between 0.12 and 0.27, or between 0.27 and 0.5.
- 14.
The weights for the best-fit line are the inverses of the squared standard errors of each parameter estimate.
- 15.
Frequency is measured as the number of occurrences of the form in question as the obligatory do of the DBC in our dataset.
- 16.
Note that we discarded the one instance of a were copula since one instance is insufficient data to estimate that form’s effect.
- 17.
In speech, 40Â % of these one-word interveners are just; in writing, the figure is 31Â %.
- 18.
These assumptions need verification, and are deliberately stated with hedges. Obviously, many verbs are not stress-initial. But more frequent words tend to be shorter, so a high percentage of the verb tokens will be monosyllabic and hence stress initial; and many polysyllabic verbs are also stress initial. The reasoning leading to our predictions does not go through when the pronoun gets contrastive stress, or when the form of help used is helping. But we are confident that our assumptions hold for enough of the data to make this a meaningful preliminary test.
References
Agresti, A. (2002). Categorical data analysis (2nd ed.). New Jersey: Wiley.
Anttila, A., Adams, M., & Speriosu, M. (2010). The role of prosody in the English dative alternation. Language and Cognitive Processes, 25(7/8/9), 946–981.
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59(4), 390–412.
Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278.
Bates, D. M., Maechler, M., & Bolker, B. lme4: Linear mixed-effects models using S4 classes. R Package version 0.999999-2. 2013. ht tp. cran. r-project. org/web/packages/lme4.
Bresnan, J., Cueni, A., Nikitina, T., & Baayen, H. (2007). Predicting the dative alternation. In G. Boume, I. Kraemer, & J. Zwarts (Eds.), Cognitive foundations of interpretation (pp. 69–95). Amsterdam: Royal Netherlands Academy of Science.
Cedergren, H. J., & Sankoff, S. (1974). Variable rules: Performance as a statistical reflection of competence. Language, 50(2), 333–355.
Chambers, J. M., & Hastie, T. J. (1991). Statistical models. In J. M. Chambers & T. J. Hastie (Eds.), Statistical models in S (Chap. 2, pp. 13–44). London: Chapman and Hall.
Cieri, C., Miller, D., & Walker, K. (2004). The fisher corpus: A resource for the next generations of speech-to-text. LREC, 4, 69–71.
Clark, H. H. (1973). The language-as-fixed-effect fallacy: A critique of language statistics in psychological research. Journal of Verbal Learning and Verbal Behavior, 12, 335–359.
Clark, H. H., & Murphy, G. L. (1982). Audience design in meaning and reference. In J. F. L. Ney & W. Kintsch (Eds.), Language and comprehension (Vol. 9, pp. 287–297). Amsterdam: North Holland Publishing.
Flickinger, D., & Wasow, T. (2013). A corpus-driven analysis of the do-be construction. In P. Hofmeister & E. Norcliffe (Eds.), The core and the periphery: Data-driven perspectives on syntax inspired by Ivan A. Sag (pp. 35–63). Stanford: CSLI Publications.
Fodor, J. D. (1998). Learning to parse? Journal of Psycholinguistic Research, 27, 285–319.
Fodor, J. D., (2002). Prosodic disambiguation in silent reading. Proceedings of NELS 32, M. Hirotani (Ed.). Amherst: GLSA, University of Massachusetts.
Goldrick, M. (2006). Limited interaction in speech production: Chronometric, speech error, and neuropsychological evidence. Language & Cognitive Processes, 21(7–8), 817–855.
Green, P. J., & Silverman, B. W. (1994). Nonparametric regression and generalized linear models: A roughness penalty approach. London: Chapman & Hall.
Hawkins, J. A. (1994). A peformance theory of order and constituency. Cambridge: Cambridge University Press.
Jaeger, T. F. (2008) “Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models.” Journal of memory and language 59(4), 434-446.
Jaeger, T. F. (2010). Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology, 61(1), 23–62.
Jaeger, T. F., Furth, K., & Hilliard, C. (2012). Phonological overlap affects lexical selection during sentence production. Journal of Experimental Psychology: Learning, Memory, & Cognition, 38(5), 1439–1449.
Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. Proceedings of the 41st Meeting of the Association for Computational Linguistics, pp. 423–430.
Lakoff, G. (1966). Stative adjectives and verbs in English. In A. G. Oettinger (Ed.), Mathematical linguistics and automatic translation. Cambridge: Harvard University. Report NSF 19, computation laboratory.
Levelt, W. J. M. (1993). Speaking: From intention to articulation. Cambridge: MIT Press.
Levy, R. P., & Jaeger, T. F. (2007). Speakers optimize information density through syntactic reduction. In J. Platt & T. Hoffman (Eds.), Advances in neural information processing systems (pp. 849–856) Cambridge: MIT Press.
Liberman, M., & Prince, A. (1977). On stress and linguistic rhythm. Linguistic Inquiry, 8, 249–336.
Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in S and S-PLUS. Berlin: Springer.
Rohde, D. L. (2005). Tgrep2 user manual. http://tedlab.mit.edu/~dr/Tgrep2/tgrep2.pdf.
Shannon, C. E. (1948). A mathematical theory of communications. Bell Systems Technical Journal, 27, 623–656.
van Draat, P. F. (1910). Rhythm in English prose. Heidelberg: Carl Winter’s Universitätsbuchhandlung.
Warren, T., & Gibson, E. (2002). The influence of referential processing on sentence complexity. Cognition, 85(1), 79–112.
Wasow, T. (2002). Postverbal behavior. Stanford: CSLI Publications.
Wasow, T., Jaeger, T. F., & Orr, D. (2011). Lexical variation in relativizer frequency. In H. Simon & H. Wiese (Eds.), Expecting the unexpected: Exceptions in grammar (pp. 175–195). Berlin: De Gruyter.
Wasow, T., Greene, R., & Levy, R. (2012). Optional to and Prosody. Poster at the 25th annual CUNY Conference on Human Sentence Processing. New York, March 2012.
Zipf, G. (1936). The Psychobiology of Language. London: Routledge.
Acknowledgment
We are grateful to two anonymous reviewers for thoughtful comments on earlier versions of this chapter.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Wasow, T., Levy, R., Melnick, R., Zhu, H., Juzek, T. (2015). Processing, Prosody, and Optional to . In: Frazier, L., Gibson, E. (eds) Explicit and Implicit Prosody in Sentence Processing. Studies in Theoretical Psycholinguistics, vol 46. Springer, Cham. https://doi.org/10.1007/978-3-319-12961-7_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-12961-7_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-12960-0
Online ISBN: 978-3-319-12961-7
eBook Packages: Humanities, Social Sciences and LawSocial Sciences (R0)