Abstract
This paper aims to carve out a place for corpus research within theoretical linguistics and psycholinguistics. We argue that annotated corpora naturally complement native speaker intuitions and controlled psycholinguistic methods and thus can be powerful tools for developing and evaluating linguistic theories. We also review basic methods and best practices for moving from corpus annotations to hypothesis formation and testing, offering practical advice and technical guidance to researchers wishing to incorporate corpus methods into their work.
Our thanks to David Beaver, Philip Hofmeister, Nancy Ide, Dan Lassiter, Colin Phillips, and James Pustejovsky.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Winston and Blais suggest that the underlying causes of these differences are complex, relating to the practices of sub-disciplines within these fields, the role of causal inference in building theories, and perceived needs to be rigorous (biology and physics textbooks and lab manuals are much more likely not to address these methodological questions at all).
- 2.
In general, one hopes that the speakers who contributed to the corpus were unconstrained by non-linguistic factors like editorial rules, censorship, and other performance limitations, but we can imagine studies where such factors actually serve the investigative goals.
- 3.
- 4.
- 5.
- 6.
- 7.
For phonetic analysis, all these languages still lag behind Praat [15].
References
Acton, E.K., Potts, C.: That straight talk: Sarah Palin and the sociolinguistics of demonstratives. J. Sociolinguist. 18(1), 3–31 (2014)
Allen, J.F., Miller, B.W., Ringger, E.K., Sikorski, T.: A robust system for natural spoken dialogue. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pp. 62–70. ACL, Santa Cruz, CA (1996)
AnderBois, S., Brasoveanu, A., Henderson, R.: The pragmatics of quantifier scope: a corpus study. In: Aguilar-Guevara, A., Chernilovskaya, A., Nouwen, R. (eds.) Proceedings of Sinn und Bedeutung 16, MIT Linguistics, Cambridge, MA, MIT Working Papers in Linguistics, vol. 1, pp. 15–28 (2012)
Andor, J.: The master and his performance: an interview with Noam Chomsky. Intercult. Pragmat. 1(1), 93–111 (2004)
Baayen, R.H.: Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht (2001)
Baayen, R.H.: Analyzing Linguistic Data: A Practical Introduction to Statistics. Cambridge University Press, Cambridge (2008)
Barton, S.B., Sanford, A.J.: A case study of anomaly detection: shallow semantic processing and cohesion establishment. Mem. Cognit. 21(4), 477–487 (1993)
Beaver, D.I.: The optimization of discourse anaphora. Linguist. Philos. 27(1), 3–56 (2004)
Beaver, D.I.: Corpus pragmatics: Something old, something new, paper presented at the annual meeting of the Texas Linguistic Society (2007)
Beaver, D.I., Francez, I., Levinson, D.: Bad subject! (Non)-canonicality and NP distribution in existentials. In: Georgala, E., Howell, J. (eds.) Proceedings of Semantics and Linguistic Theory, vol. 15, pp. 19–43. CLC Publications, Ithaca, NY (2006)
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media, Sebastopol (2009)
Blaylock, N., Allen, J.F.: Generating artificial corpora for plan recognition. In: Ardissono, L., Brna, P., Mitrovic, A. (eds.) User Modeling 2005. Lecture Notes in Artificial Intelligence, pp. 179–188. Springer, Berlin (2005)
Bock, K., Butterfield, S., Cutler, A., Cutting, J.C., Eberhard, K.M., Humphreys, K.R.: Number agreement in British and American English: disagreeing to agree collectively. Language 82(1), 64–113 (2006)
Bod, R., Hay, J., Jannedy, S. (eds.): Probabilistic Linguistics. MIT Press, Cambridge (2003)
Boersma, P., Weenink, D.: Praat: Doing phonetics by computer. Computer program; Version 5.3.60. http://www.praat.org/ (2013)
Bresnan, J., Nikitina, T.: The gradience of the dative alternation. In: Uyechi, L., Wee, L.H. (eds.) Reality Exploration and Discovery: Pattern Interaction in Language and Life, pp. 161–184. CSLI, Stanford (2010)
Burge, T.: Individualism and the mental. In: French, P., Uehling, T., Wettstein, H. (eds.) Midwest Studies in Philosophy. Studies in Metaphysics, vol. IV, pp. 73–121. University of Minnesota Press, Minneapolis (1979)
Callison-Burch, C.: Fast, cheap, and creative: evaluating translation quality using Amazon’s mechanical turk. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 286–295. ACL, Singapore (2009)
Chen, Ch., Härdle, W.K., Unwin, A. (eds.): Handbook of Data Visualization. Springer, Berlin (2008)
Chomsky, N.: A review of B. F. Skinner’s verbal behavior. Language 35(1), 26–58 (1957)
Chomsky, N.: Syntactic Structures. Mouton, The Hague (1957)
Chomsky, N.: Aspects of the Theory of Syntax. MIT Press, Cambridge (1965)
Chomsky, N.: Knowledge of Language. Praeger, New York (1986)
Clark, H.H.: Dogmas of understanding. Discourse Process. 23(3), 567–598 (1997)
Clarke, A.D.F., Elsner, M., Rohde, H.: Where’s Wally: The influence of visual salience on referring expression generation. Front. Psychol. (Percept. Sci.) 4(1), 1–10 (2013)
Cleveland, W.S.: The Elements of Graphing Data. Hobart Press, Summit (1985)
Constant, N., Davis, C., Potts, C., Schwarz, F.: The pragmatics of expressive content: evidence from large corpora. Sprache und Datenverarbeitung 33(1–2), 5–21 (2009)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Culbertson, J., Gross, S.: Are linguists better subjects? Br. J. Philos. Sci. 60(4), 721–736 (2009)
de Marneffe, M.C., Rafferty, A.N., Manning, C.D.: Finding contradictions in text. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, pp. 1039–1047. ACL, Columbus, OH (2008)
de Marneffe, M.C., Manning, C.D., Potts, C.: “Was it good? It was provocative.” Learning the meaning of scalar adjectives. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 167–176. ACL, Uppsala, Sweden (2010)
de Marneffe, M.C., Manning, C.D., Potts, C.: Did it happen? The pragmatic complexity of veridicality assessment. Comput. Linguist. 38(2), 301–333 (2012)
de Marneffe, M.C., Connor, M., Silveira, N., Bowman, S.R., Dozat, T., Manning, C.D.: More constructions, more genres: extending stanford dependencies. In: Hajičová, E., Gerdes, K., Wanner, L. (eds.) Proceedings of the Second International Conference on Dependency Linguistics, pp. 187–196. ACL, Prague (2013)
Degen, J.: A corpus-based study of Some (but not All) implicatures, ms., University of Rochester (2013)
Devitt, M.: Intuitions in linguistics. Br. J. Philos. Sci. 57(3), 481–513 (2006)
Dewey, G.: Relative Frequency of English Speech Sounds. Harvard University Press, Cambridge (1923)
Díaz-Negrillo, A., Fernández-Domínguez, J.: Error tagging systems for learner corpora. Revista Espanola de Linguistica Aplicada 19, 83–102 (2006)
Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)
Duan, M., Elsner, M., de Marneffe, M.C.: Visual and linguistic predictors for the definiteness of referring expressions. In: Proceedings of the 17th Workshop on the Semantics and Pragmatics of Dialogue, pp. 25–34 (2013)
Erlewine, M.Y.: The Constituency of Hyperlinks in a Hypertext Corpus, ms., MIT (2011)
Faye, J.: Copenhagen interpretation of quantum mechanics. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy, fall 2008 edition edn, CSLI. http://plato.stanford.edu/archives/fall2008/entries/qm-copenhagen/ (2008)
Fillmore, C.J.: “Corpus linguistics” or “computer-aided armchair linguistics”. In: Svartvik [144], pp. 35–66 (1992)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 363–370. ACL, Ann Arbor, MI (2005)
Francis, W.N., Kučera, H.: Manual of information to accompany a ‘standard sample of present-day edited American English, for use with digital computers’, Technical report. Brown University, Providence, RI (1979)
Francis, W.N., Kučera, H.: A standard sample of present-day English for use with digital computers. Report to the U. S. Office of Education on Cooperative Research Project E-007, Brown University, Providence, RI (1964)
Frank, A.F., Jaeger, T.F.: Speaking rationally: uniform information density as an optimal strategy for language production. In: Proceedings of the Cognitive Science Society, Washington, D.C., pp. 939–944 (2008)
Frazier, L.: Co-reference and adult language comprehension. Rev. Linguist. 8(2), 1–11 (2012)
Friedl, J.E.F.: Mastering Regular Expressions, 3rd edn. O’Reilly Media, Sebastopol (2006)
Gelman, A.: Review essay: causality and statistical learning. Am. J. Sociol. 117(3), 955–966 (2011)
Gelman, A., Stern, H.S.: The difference between “significant” and “not significant” is not itself statistically significant. Am. Stat. 60(4), 328–331 (2006)
Glass, L.: What does it mean for an implicit object to be recoverable? In: Proceedings of the Penn Linguistics Colloquium, Penn Linguistics Club, Philadelphia, PA (2013)
Godfrey, J.J., Holliman, E.: Switchboard-1 release 2. Linguistic Data Consortium, Catalog #LDC97S62 (1997)
Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Comput. Linguist. 27(2), 153–198 (2001)
Goldwater, S., Johnson, M.: Learning OT constraint rankings using a maximum entropy model. In: Spenader, J., Eriksson, A., Dahl, Ö. (eds.) Proceedings of the Stockholm Workshop on Variation within Optimality Theory, pp. 111–120. Stockholm University, Stockholm (2003)
Goldwater, S., Griffiths, T.L., Johnson, M.: Contextual dependencies in unsupervised word segmentation. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 673–680. ACL, Sydney, Australia (2006)
Goodman, N.D., Lassiter, D.: Probabilistic semantics and pragmatics: uncertainty in language and thought. In: Lappin, S., Fox, C. (eds.) The Handbook of Contemporary Semantic Theory, 2nd edn. Wiley-Blackwell, Oxford (2015)
Gordon, P.C., Hendrick, R.: Intuitive knowledge of linguistic co-reference. Cognition 3(3), 325–370 (1997)
Gordon, P.C., Grosz, B.J., Gilliom, L.A.: Pronouns, names and the centering of attention in discourse. Cognit. Sci. 17(3), 311–348 (1993)
Gries, S.T.: Null-hypothesis significance testing of word frequencies: a follow-up on Kilgarriff. Corpus Linguist. Linguist. Theory 1(2), 277–294 (2005)
Gries, S.T.: Quantitative Corpus Linguistics with R: A Practical Introduction. Routledge, London (2009)
Grimm, S., McNally, L.: No ordered arguments needed for nouns. In: Aloni, M., Franke, M., Roelofsen, F. (eds.) Proceedings of the 19th Amsterdam Colloquium, pp. 123–130. ILLC, Amsterdam (2013)
Hacquard, V., Wellwood, A.: Embedding epistemic modals in English: a corpus-based study. Semant. Pragmat. 5(4), 1–29 (2012)
Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intell. Syst. 24(2), 8–12 (2009)
Harris, J.A., Potts, C.: Perspective-shifting with appositives and expressives. Linguist. Philos. 32(6), 523–552 (2009)
Harris, R.A.: The Linguistic Wars. Oxford University Press, Oxford (1993)
Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)
Hartshorne, J.K., Bonial, C., Palmer, M.: The VerbCorner project: toward an empirically-based semantic decomposition of verbs. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1438–1442. Association for Computational Linguistics, Seattle (2013)
Hayes, B., Wilson, C.: A maximum entropy model of phonotactics and phonotactic learning. Linguist. Inq. 39(3), 379–440 (2008)
Heer, J., Bostock, M.: Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In: ACM Human Factors in Computing Systems, pp. 203–212 (2010)
Higgins, D., Sadock, J.M.: A machine learning approach to modeling scope preferences. Comput. Linguist. 29(1), 73–96 (2003)
Hockett, C.F.: A note on ‘structure’ [review of de Goeje by W. D. Preston]. Int. J. Am. Linguist. 14(4), 269–271 (1948)
Hockett, C.F.: Two models of grammatical description. Word 10(2), 210–234 (1954)
Hoeksema, J.: Corpus study of negative polarity items. University of Groningen. http://www.let.rug.nl/hoeksema/docs/barcelona.html (1997)
Hoeksema, J.: There is no number effect in the licensing of negative polarity items: a reply to Guerzoni and Sharvit. Linguist. Philos. 31(4), 397–407 (2008)
Horn, L.R.: Duplex negatio affirmat...: the economy of double negation. In: Dobrin, L.M., Nichols, L., Rodriguez, R.M. (eds) Papers from the 27th Regional Meeting of the Chicago Linguistic Society, Chicago Linguistic Society, Chicago, vol 2: The Parasession on Negation, pp. 80–106 (1991)
Hsueh, P.Y., Melville, P., Sindhwani, V.: Data quality from crowdsourcing: a study of annotation selection criteria. Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, pp. 27–35. ACL, Boulder, CO (2009)
Jackendoff, R.S.: Languages of the Mind. MIT Press, Cambridge (1992)
Jurafsky, D.: A probabilistic model of lexical and syntactic access and disambiguation. Cognit. Sci. 20(2), 137–194 (1996)
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. Prentice-Hall, Englewood Cliffs (2009)
Katz, J.J.: Language and Other Abstract Objects. Rowman and Littlefield, Totowa (1981)
Katz, J.J., Postal, P.M.: Realism vs. conceptualism in linguistics. Linguist. Philos. 14(5), 515–554 (1991)
Kilgarriff, A.: Language is never, ever, ever, random. Corpus Linguist. Linguist. Theory 1(2), 263–276 (2005)
Kilgarriff, A.: Googleology is bad science. Comput. Linguist. 33(1), 147–151 (2007)
Kilgarriff, A.: Getting to know your corpus. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) Text, Speech and Dialogue: 15th International Conference. Lecture Notes in Artificial Intelligence, vol. 7499, pp. 3–15. Springer, Berlin (2012)
Kilgarriff, A., Grefenstette, G.: Introduction to the special issue on the Web as corpus. Comput. Linguist. 29(3), 333–347 (2003)
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, ACL, Sapporo, Japan, vol. 1, pp. 423–430 (2003)
Kučera, H., Francis, W.N.: Computational Analysis of Present-Day American English. Brown University Press, Providence (1967)
Kwiatkowski, T., Zettlemoyer, L.S., Goldwater, S., Steedman, M.: Lexical generalization in CCG grammar induction for semantic parsing. Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1512–1523. ACL, Edinburgh (2011)
Lassiter, D.: Semantic externalism, language variation, and sociolinguistic accommodation. Mind Lang. 23(5), 607–633 (2008)
Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. Proceedings of the 15th Conference on Computational Natural Language Learning: Shared Task, pp. 28–34. ACL, Portland, OR (2011)
Leech, G.N.: Corpora and theories of linguistic performance. In: Svartvik [144], pp. 105–122 (1992)
Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. Chicago University Press, Chicago (1993)
Levy, R.: Expectation-based syntactic comprehension. Cognition 106(3), 1126–1177 (2008)
Levy, R., Andrew, G.: Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In: Proceedings of the 5th Edition of the International Conference on Language Resources and Evaluation, pp. 2231–2234 (2006)
Levy, R., Jaeger, T.F.: Speakers optimize information density through syntactic reduction. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 849–856. MIT Press, Cambridge (2007)
Lewis, D.: Convention. Harvard University Press, Cambridge, MA, reprinted 2002 by Blackwell (1969)
Liang, P., Jordan, M.I., Klein, D.: Learning dependency-based compositional semantics. Comput. Linguist. 39(2), 389–446 (2013)
Liberman, M.: Questioning reality, language Log, January 24. http://itre.cis.upenn.edu/~myl/languagelog/archives/001837.html (2005)
MacWhinney, B.: The CHILDES Project: Tools for Analyzing Talk, 3rd edn. Lawrence Erlbaum Associates, Mahwah (2000)
Manning, C.D.: Probabilistic syntax. In: Bod et al. [14], pp. 289–341 (2003)
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A., Taylor, A.: The Penn treebank 3. Linguistic Data Consortium, Catalog #LDC99T42 (1999)
McEnery, T., Wilson, A.: Corpus Linguistics: An Introduction. Edinburgh University Press, Edinburgh (2001)
McKinney, W.: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media, Sebastopol (2012)
Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Team, The Google Books, Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)
Monroe, B.L., Colaresi, M.P., Quinn, K.M.: Fightin’ words: lexical feature selection and evaluation for identifying the content of political conflict. Polit. Anal. 16(4), 372–403 (2009)
Muchnik, L., Aral, S., Taylor, S.J.: Social influence bias: a randomized experiment. Science 341(6146), 647–651 (2013)
Munro, R.: Processing short message communications in low-resource languages. PhD thesis, Stanford University, Stanford, CA (2012)
Munro, R., Bethard, S., Kuperman, V., Lai, V.T., Melnick, R., Potts, C., Schnoebelen, T., Tily, H.: Crowdsourcing and language studies: the new generation of linguistic data. Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 122–130. ACL, Los Angeles (2010)
Norvig, P.: Natural language corpus data. In: Segaran, T., Hammerbacher, J. (eds.) Beautiful Data, pp. 219–242. O’Reilly Media (2009)
Norvig, P.: On Chomsky and the two cultures of statistical learning. http://norvig.com/chomsky.html, google, Inc (2011)
Odersky, M., Spoon, L., Venners, B.: Programming in Scala, 2nd edn. Artima, Walnut Creek (2010)
Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 380–390. ACL, Atlanta, GA (2013)
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 115–124. ACL, Ann Arbor, MI (2005)
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1), 1–135 (2008)
Pereira, F.C.N.: Formal grammar and information theory: together again? Philos. Trans. R. Soc. 358(1769), 1239–1253 (2000)
Phillips, C.: Some arguments and nonarguments for reductionist accounts of syntactic phenomena. Lang. Cognit. Process. 28(1–2), 156–187 (2013)
Potts, C.: On the negativity of negation. In: Li, N., Lutz, D. (eds.) Proceedings of Semantics and Linguistic Theory, vol. 20, pp. 636–659. CLC Publications, Ithaca, NY (2011)
Potts, C.: Conventional implicature and expressive content. In: Maienborn, C., von Heusinger, K., Portner, P. (eds.) Semantics: An International Handbook of Natural Language Meaning, vol. 3, pp. 2516–2536. Mouton de Gruyter, Berlin (2012a)
Potts, C.: Goal-driven answers in the cards dialogue corpus. In: Arnett, N., Bennett, R. (eds.) Proceedings of the 30th West Coast Conference on Formal Linguistics, pp. 1–20. Cascadilla Press, Somerville, MA (2012b)
Potts, C., Schwarz, F.: Affective ‘this’. Linguist. Issues Lang. Technol. 3(5), 1–30 (2010)
Putnam, H.: Mind, Language, and Reality: Philosophical Papers, vol. 2. Cambridge University Press, Cambridge (1975)
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)
Recasens, M., de Marneffe, M.C., Potts, C.: The life and death of discourse entities: identifying singleton mentions. Human Language Technologies: The 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 627–633. ACL, Atlanta, Georgia (2013)
Ring, N., Uitdenbogerd, A.L.: Finding ‘Lucy in disguise’: the misheard lyric matching problem. In: Lee, G.G., Song, D., Lin, C.Y., Aizawa, A., Kuriyama, K., Yoshioka, M., Sakai, T. (eds.) Information Retrieval Technology: 5th Asia Information Retrieval Symposium. Lecture Notes in Computer Science, vol. 5839, pp 157–167. Springer, Berlin (2009)
Ritter, A., Clark, S., Mausam, Etzioni O.: Named entity recognition in tweets: An experimental study. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. ACL, Edinburgh (2011)
Roark, B., Sproat, R.: Computational Approaches to Morphology and Syntax. Oxford University Press, Cambridge (2007)
Sag, I.A., Wasow, T.: Performance-compatible competence grammar. In: Borsley, R., Börjar, K. (eds.) Non-Transformational Syntax: Formal and Explicit Models of Grammar, pp. 359–377. Wiley-Blackwell, Oxford (2011)
Saurí, R.: A factuality profiler for eventualities in text. Ph.D. thesis, Computer Science Department, Brandeis University (2008)
Saurí, R., Pustejovsky, J.: FactBank: a corpus annotated with event factuality. Lang. Resour. Eval. 43(3), 227–268 (2009)
Scholz, B.C., Pelletier, F.J., Pullum, G.K.: Philosophy of linguistics. In: Zalta EN (ed) The Stanford Encyclopedia of Philosophy, winter 2011 edn, CSLI, Stanford, CA. http://plato.stanford.edu/archives/win2011/entries/linguistics/ (2011)
Schütze, C.T.: The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. University of Chicago Press, Chicago (1996)
Schütze, C.T.: Web searches should supplement judgements, not supplant them. Zeitschrift für Sprachwissenschaft 28(1), 151–156 (2009)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423, 623–656 (1948)
Simmons, J.P., Nelson, L.D., Simonsohn, U.: False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22(11), 1359–1366 (2013)
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast – but is it good? Evaluating non-expert annotations for natural language tasks. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 254–263. ACL, Honolulu, Hawaii (2008)
Snyder, W.: An experimental investigation of syntactic satiation effects. Linguist. Inq. 31(3), 575–582 (2000)
Spencer, N.J.: Differences between linguists and nonlinguists in intuitions of grammaticality-acceptability. J. Psycholinguist. Res. 2(2), 83–98 (1973)
Spitkovsky, V.I., Jurafsky, D., Alshawi, H.: Profiting from mark-up: Hyper-text annotations for guided parsing. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1278–1287. ACL, Uppsala, Sweden (2010)
Sproat, R., Shih, C.: The cross-linguistic distribution of adjective ordering restrictions. In: Georgopoulos C, Ishihara R (eds) Interdisciplinary Approaches to Language: Essays in Honor of S.-Y. Kuroda, pp. 565–59. Springer, Berlin (1991)
Sprouse, J.: A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behav. Res. Methods 43(1), 155–167 (2010)
Sprouse, J., Schütze, C.T., Almeida, D.: A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001–2010. Lingua 134, 219–248 (2013)
Stoia, L., Shockley, D.M., Byron, D.K., Fosler-Lussier, E.: SCARE: A situated corpus with annotated referring expressions. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, European Language Resources Association, Marrakesh, Morroco (2008)
Svartvik, J. (eds.) Directions in Corpus Linguistics: Proceedings of Nobel Symposium, vol. 82. Mouton de Gruyter, Berlin (1992)
Thomas, M., Pang, B., Lee, L.: Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 327–335. ACL, Sydney, Australia (2006)
Thompson, H.S., Anderson, A., Bard, E.G., Doherty-Sneddon, G., Newlands, A., Sotillo, C.: The HCRC map task corpus: Natural dialogue for speech recognition. HLT ’93: Proceedings of the workshop on Human Language Technology, pp. 25–30. ACL, Princeton (1993)
Thuilier, J., Abeille, A., Crabbé, B.: Ordering preferences for postverbal complements in French. In: Tyne, H., André, V., Boulton, A., Benzitoun, C. (eds.) Ecological and Data-Driven Perspectives in French Language Studies. Cambridge Scholars Publishing, Cambridge (2013)
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics, vol. 1, pp. 173–180. ACL, Edmonton, Canada, NAACL ’03 (2003)
Tufte, E.R.: The Visual Display of Quantitative Information, 2nd edn. Graphics Press, Cheshire (2001)
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)
Vickers, J.: The problem of induction. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy, spring 2013 edn, CSLI. http://plato.stanford.edu/entries/induction-problem/ (2013)
Vitevitch, M.S.: Naturalistic and experimental analyses of word frequency and neighborhood density effects in slips of the ear. Lang. Speech 45(4), 407–434 (2002)
Walker, M.A., Joshi, A.K., Prince, E.F. (eds.): Centering in Discourse. Oxford University Press, Oxford (1997)
Wason, P.C., Reich, S.S.: A verbal illusion. Q. J. Exp. Psychol. 31(4), 591–597 (1979)
Wierzbicka, A.: English Speech Act Verbs: A semantic dictionary. Academic Press, New York (1987)
Winston, A.S., Blais, D.J.: What counts as an experiment? a transdisciplinary analysis of textbooks, 1930–1970. Am. J. Psychol. 109(4), 599–616 (1996)
Wong, Y.W., Mooney, R.: Learning synchronous grammars for semantic parsing with lambda calculus. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pp. 960–967. ACL , Prague, Czech Republic (2007)
Wu, F., Huberman, B.A.: How public opinion forms. In: Papadimitriou, C., Zhang, S. (eds.) Internet and Network Economics. Lecture Notes in Computer Science, vol. 5385, pp. 334–341. Springer, Berlin (2008)
Zettlemoyer, L.S.: Learning to map sentences to logical form. Ph.D. thesis, MIT, Cambridge, MA (2009)
Zipf, G.K.: Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge (1949)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
de Marneffe, MC., Potts, C. (2017). Developing Linguistic Theories Using Annotated Corpora. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_16
Download citation
DOI: https://doi.org/10.1007/978-94-024-0881-2_16
Published:
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)