Developing Linguistic Theories Using Annotated Corpora

de Marneffe, Marie-Catherine; Potts, Christopher

doi:10.1007/978-94-024-0881-2_16

Marie-Catherine de Marneffe³ &
Christopher Potts⁴

2135 Accesses
2 Citations

Abstract

This paper aims to carve out a place for corpus research within theoretical linguistics and psycholinguistics. We argue that annotated corpora naturally complement native speaker intuitions and controlled psycholinguistic methods and thus can be powerful tools for developing and evaluating linguistic theories. We also review basic methods and best practices for moving from corpus annotations to hypothesis formation and testing, offering practical advice and technical guidance to researchers wishing to incorporate corpus methods into their work.

Our thanks to David Beaver, Philip Hofmeister, Nancy Ide, Dan Lassiter, Colin Phillips, and James Pustejovsky.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 349.00; Price excludes VAT (USA)

Softcover Book: USD 449.99; Price excludes VAT (USA)

Hardcover Book: USD 449.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Winston and Blais suggest that the underlying causes of these differences are complex, relating to the practices of sub-disciplines within these fields, the role of causal inference in building theories, and perceived needs to be rigorous (biology and physics textbooks and lab manuals are much more likely not to address these methodological questions at all).
2.
In general, one hopes that the speakers who contributed to the corpus were unconstrained by non-linguistic factors like editorial rules, censorship, and other performance limitations, but we can imagine studies where such factors actually serve the investigative goals.
3.
For recent attempts to build tagging and parsing models that are better-suited to informal Web data, see [33, 113, 126].
4.
http://java.com/.
5.
http://www.python.org.
6.
http://www.r-project.org.
7.
For phonetic analysis, all these languages still lag behind Praat [15].

References

Acton, E.K., Potts, C.: That straight talk: Sarah Palin and the sociolinguistics of demonstratives. J. Sociolinguist. 18(1), 3–31 (2014)
Article Google Scholar
Allen, J.F., Miller, B.W., Ringger, E.K., Sikorski, T.: A robust system for natural spoken dialogue. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pp. 62–70. ACL, Santa Cruz, CA (1996)
Chapter Google Scholar
AnderBois, S., Brasoveanu, A., Henderson, R.: The pragmatics of quantifier scope: a corpus study. In: Aguilar-Guevara, A., Chernilovskaya, A., Nouwen, R. (eds.) Proceedings of Sinn und Bedeutung 16, MIT Linguistics, Cambridge, MA, MIT Working Papers in Linguistics, vol. 1, pp. 15–28 (2012)
Google Scholar
Andor, J.: The master and his performance: an interview with Noam Chomsky. Intercult. Pragmat. 1(1), 93–111 (2004)
Article Google Scholar
Baayen, R.H.: Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht (2001)
Book Google Scholar
Baayen, R.H.: Analyzing Linguistic Data: A Practical Introduction to Statistics. Cambridge University Press, Cambridge (2008)
Book Google Scholar
Barton, S.B., Sanford, A.J.: A case study of anomaly detection: shallow semantic processing and cohesion establishment. Mem. Cognit. 21(4), 477–487 (1993)
Article Google Scholar
Beaver, D.I.: The optimization of discourse anaphora. Linguist. Philos. 27(1), 3–56 (2004)
Article Google Scholar
Beaver, D.I.: Corpus pragmatics: Something old, something new, paper presented at the annual meeting of the Texas Linguistic Society (2007)
Google Scholar
Beaver, D.I., Francez, I., Levinson, D.: Bad subject! (Non)-canonicality and NP distribution in existentials. In: Georgala, E., Howell, J. (eds.) Proceedings of Semantics and Linguistic Theory, vol. 15, pp. 19–43. CLC Publications, Ithaca, NY (2006)
Google Scholar
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media, Sebastopol (2009)
Google Scholar
Blaylock, N., Allen, J.F.: Generating artificial corpora for plan recognition. In: Ardissono, L., Brna, P., Mitrovic, A. (eds.) User Modeling 2005. Lecture Notes in Artificial Intelligence, pp. 179–188. Springer, Berlin (2005)
Google Scholar
Bock, K., Butterfield, S., Cutler, A., Cutting, J.C., Eberhard, K.M., Humphreys, K.R.: Number agreement in British and American English: disagreeing to agree collectively. Language 82(1), 64–113 (2006)
Article Google Scholar
Bod, R., Hay, J., Jannedy, S. (eds.): Probabilistic Linguistics. MIT Press, Cambridge (2003)
Google Scholar
Boersma, P., Weenink, D.: Praat: Doing phonetics by computer. Computer program; Version 5.3.60. http://www.praat.org/ (2013)
Bresnan, J., Nikitina, T.: The gradience of the dative alternation. In: Uyechi, L., Wee, L.H. (eds.) Reality Exploration and Discovery: Pattern Interaction in Language and Life, pp. 161–184. CSLI, Stanford (2010)
Google Scholar
Burge, T.: Individualism and the mental. In: French, P., Uehling, T., Wettstein, H. (eds.) Midwest Studies in Philosophy. Studies in Metaphysics, vol. IV, pp. 73–121. University of Minnesota Press, Minneapolis (1979)
Google Scholar
Callison-Burch, C.: Fast, cheap, and creative: evaluating translation quality using Amazon’s mechanical turk. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 286–295. ACL, Singapore (2009)
Google Scholar
Chen, Ch., Härdle, W.K., Unwin, A. (eds.): Handbook of Data Visualization. Springer, Berlin (2008)
Google Scholar
Chomsky, N.: A review of B. F. Skinner’s verbal behavior. Language 35(1), 26–58 (1957)
Article Google Scholar
Chomsky, N.: Syntactic Structures. Mouton, The Hague (1957)
Google Scholar
Chomsky, N.: Aspects of the Theory of Syntax. MIT Press, Cambridge (1965)
Google Scholar
Chomsky, N.: Knowledge of Language. Praeger, New York (1986)
Google Scholar
Clark, H.H.: Dogmas of understanding. Discourse Process. 23(3), 567–598 (1997)
Article Google Scholar
Clarke, A.D.F., Elsner, M., Rohde, H.: Where’s Wally: The influence of visual salience on referring expression generation. Front. Psychol. (Percept. Sci.) 4(1), 1–10 (2013)
Google Scholar
Cleveland, W.S.: The Elements of Graphing Data. Hobart Press, Summit (1985)
Google Scholar
Constant, N., Davis, C., Potts, C., Schwarz, F.: The pragmatics of expressive content: evidence from large corpora. Sprache und Datenverarbeitung 33(1–2), 5–21 (2009)
Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Book Google Scholar
Culbertson, J., Gross, S.: Are linguists better subjects? Br. J. Philos. Sci. 60(4), 721–736 (2009)
Article Google Scholar
de Marneffe, M.C., Rafferty, A.N., Manning, C.D.: Finding contradictions in text. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, pp. 1039–1047. ACL, Columbus, OH (2008)
Google Scholar
de Marneffe, M.C., Manning, C.D., Potts, C.: “Was it good? It was provocative.” Learning the meaning of scalar adjectives. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 167–176. ACL, Uppsala, Sweden (2010)
Google Scholar
de Marneffe, M.C., Manning, C.D., Potts, C.: Did it happen? The pragmatic complexity of veridicality assessment. Comput. Linguist. 38(2), 301–333 (2012)
Article Google Scholar
de Marneffe, M.C., Connor, M., Silveira, N., Bowman, S.R., Dozat, T., Manning, C.D.: More constructions, more genres: extending stanford dependencies. In: Hajičová, E., Gerdes, K., Wanner, L. (eds.) Proceedings of the Second International Conference on Dependency Linguistics, pp. 187–196. ACL, Prague (2013)
Google Scholar
Degen, J.: A corpus-based study of Some (but not All) implicatures, ms., University of Rochester (2013)
Google Scholar
Devitt, M.: Intuitions in linguistics. Br. J. Philos. Sci. 57(3), 481–513 (2006)
Article Google Scholar
Dewey, G.: Relative Frequency of English Speech Sounds. Harvard University Press, Cambridge (1923)
Book Google Scholar
Díaz-Negrillo, A., Fernández-Domínguez, J.: Error tagging systems for learner corpora. Revista Espanola de Linguistica Aplicada 19, 83–102 (2006)
Google Scholar
Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)
Article Google Scholar
Duan, M., Elsner, M., de Marneffe, M.C.: Visual and linguistic predictors for the definiteness of referring expressions. In: Proceedings of the 17th Workshop on the Semantics and Pragmatics of Dialogue, pp. 25–34 (2013)
Google Scholar
Erlewine, M.Y.: The Constituency of Hyperlinks in a Hypertext Corpus, ms., MIT (2011)
Google Scholar
Faye, J.: Copenhagen interpretation of quantum mechanics. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy, fall 2008 edition edn, CSLI. http://plato.stanford.edu/archives/fall2008/entries/qm-copenhagen/ (2008)
Fillmore, C.J.: “Corpus linguistics” or “computer-aided armchair linguistics”. In: Svartvik [144], pp. 35–66 (1992)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 363–370. ACL, Ann Arbor, MI (2005)
Google Scholar
Francis, W.N., Kučera, H.: Manual of information to accompany a ‘standard sample of present-day edited American English, for use with digital computers’, Technical report. Brown University, Providence, RI (1979)
Google Scholar
Francis, W.N., Kučera, H.: A standard sample of present-day English for use with digital computers. Report to the U. S. Office of Education on Cooperative Research Project E-007, Brown University, Providence, RI (1964)
Google Scholar
Frank, A.F., Jaeger, T.F.: Speaking rationally: uniform information density as an optimal strategy for language production. In: Proceedings of the Cognitive Science Society, Washington, D.C., pp. 939–944 (2008)
Google Scholar
Frazier, L.: Co-reference and adult language comprehension. Rev. Linguist. 8(2), 1–11 (2012)
Google Scholar
Friedl, J.E.F.: Mastering Regular Expressions, 3rd edn. O’Reilly Media, Sebastopol (2006)
Google Scholar
Gelman, A.: Review essay: causality and statistical learning. Am. J. Sociol. 117(3), 955–966 (2011)
Article Google Scholar
Gelman, A., Stern, H.S.: The difference between “significant” and “not significant” is not itself statistically significant. Am. Stat. 60(4), 328–331 (2006)
Article Google Scholar
Glass, L.: What does it mean for an implicit object to be recoverable? In: Proceedings of the Penn Linguistics Colloquium, Penn Linguistics Club, Philadelphia, PA (2013)
Google Scholar
Godfrey, J.J., Holliman, E.: Switchboard-1 release 2. Linguistic Data Consortium, Catalog #LDC97S62 (1997)
Google Scholar
Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Comput. Linguist. 27(2), 153–198 (2001)
Article Google Scholar
Goldwater, S., Johnson, M.: Learning OT constraint rankings using a maximum entropy model. In: Spenader, J., Eriksson, A., Dahl, Ö. (eds.) Proceedings of the Stockholm Workshop on Variation within Optimality Theory, pp. 111–120. Stockholm University, Stockholm (2003)
Google Scholar
Goldwater, S., Griffiths, T.L., Johnson, M.: Contextual dependencies in unsupervised word segmentation. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 673–680. ACL, Sydney, Australia (2006)
Google Scholar
Goodman, N.D., Lassiter, D.: Probabilistic semantics and pragmatics: uncertainty in language and thought. In: Lappin, S., Fox, C. (eds.) The Handbook of Contemporary Semantic Theory, 2nd edn. Wiley-Blackwell, Oxford (2015)
Google Scholar
Gordon, P.C., Hendrick, R.: Intuitive knowledge of linguistic co-reference. Cognition 3(3), 325–370 (1997)
Article Google Scholar
Gordon, P.C., Grosz, B.J., Gilliom, L.A.: Pronouns, names and the centering of attention in discourse. Cognit. Sci. 17(3), 311–348 (1993)
Article Google Scholar
Gries, S.T.: Null-hypothesis significance testing of word frequencies: a follow-up on Kilgarriff. Corpus Linguist. Linguist. Theory 1(2), 277–294 (2005)
Article Google Scholar
Gries, S.T.: Quantitative Corpus Linguistics with R: A Practical Introduction. Routledge, London (2009)
Book Google Scholar
Grimm, S., McNally, L.: No ordered arguments needed for nouns. In: Aloni, M., Franke, M., Roelofsen, F. (eds.) Proceedings of the 19th Amsterdam Colloquium, pp. 123–130. ILLC, Amsterdam (2013)
Google Scholar
Hacquard, V., Wellwood, A.: Embedding epistemic modals in English: a corpus-based study. Semant. Pragmat. 5(4), 1–29 (2012)
Google Scholar
Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intell. Syst. 24(2), 8–12 (2009)
Article Google Scholar
Harris, J.A., Potts, C.: Perspective-shifting with appositives and expressives. Linguist. Philos. 32(6), 523–552 (2009)
Article Google Scholar
Harris, R.A.: The Linguistic Wars. Oxford University Press, Oxford (1993)
Google Scholar
Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)
Article Google Scholar
Hartshorne, J.K., Bonial, C., Palmer, M.: The VerbCorner project: toward an empirically-based semantic decomposition of verbs. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1438–1442. Association for Computational Linguistics, Seattle (2013)
Google Scholar
Hayes, B., Wilson, C.: A maximum entropy model of phonotactics and phonotactic learning. Linguist. Inq. 39(3), 379–440 (2008)
Article Google Scholar
Heer, J., Bostock, M.: Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In: ACM Human Factors in Computing Systems, pp. 203–212 (2010)
Google Scholar
Higgins, D., Sadock, J.M.: A machine learning approach to modeling scope preferences. Comput. Linguist. 29(1), 73–96 (2003)
Article Google Scholar
Hockett, C.F.: A note on ‘structure’ [review of de Goeje by W. D. Preston]. Int. J. Am. Linguist. 14(4), 269–271 (1948)
Article Google Scholar
Hockett, C.F.: Two models of grammatical description. Word 10(2), 210–234 (1954)
Article Google Scholar
Hoeksema, J.: Corpus study of negative polarity items. University of Groningen. http://www.let.rug.nl/hoeksema/docs/barcelona.html (1997)
Hoeksema, J.: There is no number effect in the licensing of negative polarity items: a reply to Guerzoni and Sharvit. Linguist. Philos. 31(4), 397–407 (2008)
Article Google Scholar
Horn, L.R.: Duplex negatio affirmat...: the economy of double negation. In: Dobrin, L.M., Nichols, L., Rodriguez, R.M. (eds) Papers from the 27th Regional Meeting of the Chicago Linguistic Society, Chicago Linguistic Society, Chicago, vol 2: The Parasession on Negation, pp. 80–106 (1991)
Google Scholar
Hsueh, P.Y., Melville, P., Sindhwani, V.: Data quality from crowdsourcing: a study of annotation selection criteria. Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, pp. 27–35. ACL, Boulder, CO (2009)
Google Scholar
Jackendoff, R.S.: Languages of the Mind. MIT Press, Cambridge (1992)
Google Scholar
Jurafsky, D.: A probabilistic model of lexical and syntactic access and disambiguation. Cognit. Sci. 20(2), 137–194 (1996)
Article Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. Prentice-Hall, Englewood Cliffs (2009)
Google Scholar
Katz, J.J.: Language and Other Abstract Objects. Rowman and Littlefield, Totowa (1981)
Google Scholar
Katz, J.J., Postal, P.M.: Realism vs. conceptualism in linguistics. Linguist. Philos. 14(5), 515–554 (1991)
Article Google Scholar
Kilgarriff, A.: Language is never, ever, ever, random. Corpus Linguist. Linguist. Theory 1(2), 263–276 (2005)
Article Google Scholar
Kilgarriff, A.: Googleology is bad science. Comput. Linguist. 33(1), 147–151 (2007)
Article Google Scholar
Kilgarriff, A.: Getting to know your corpus. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) Text, Speech and Dialogue: 15th International Conference. Lecture Notes in Artificial Intelligence, vol. 7499, pp. 3–15. Springer, Berlin (2012)
Google Scholar
Kilgarriff, A., Grefenstette, G.: Introduction to the special issue on the Web as corpus. Comput. Linguist. 29(3), 333–347 (2003)
Article Google Scholar
Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, ACL, Sapporo, Japan, vol. 1, pp. 423–430 (2003)
Google Scholar
Kučera, H., Francis, W.N.: Computational Analysis of Present-Day American English. Brown University Press, Providence (1967)
Google Scholar
Kwiatkowski, T., Zettlemoyer, L.S., Goldwater, S., Steedman, M.: Lexical generalization in CCG grammar induction for semantic parsing. Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1512–1523. ACL, Edinburgh (2011)
Google Scholar
Lassiter, D.: Semantic externalism, language variation, and sociolinguistic accommodation. Mind Lang. 23(5), 607–633 (2008)
Article Google Scholar
Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. Proceedings of the 15th Conference on Computational Natural Language Learning: Shared Task, pp. 28–34. ACL, Portland, OR (2011)
Google Scholar
Leech, G.N.: Corpora and theories of linguistic performance. In: Svartvik [144], pp. 105–122 (1992)
Google Scholar
Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. Chicago University Press, Chicago (1993)
Google Scholar
Levy, R.: Expectation-based syntactic comprehension. Cognition 106(3), 1126–1177 (2008)
Article Google Scholar
Levy, R., Andrew, G.: Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In: Proceedings of the 5th Edition of the International Conference on Language Resources and Evaluation, pp. 2231–2234 (2006)
Google Scholar
Levy, R., Jaeger, T.F.: Speakers optimize information density through syntactic reduction. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 849–856. MIT Press, Cambridge (2007)
Google Scholar
Lewis, D.: Convention. Harvard University Press, Cambridge, MA, reprinted 2002 by Blackwell (1969)
Google Scholar
Liang, P., Jordan, M.I., Klein, D.: Learning dependency-based compositional semantics. Comput. Linguist. 39(2), 389–446 (2013)
Article Google Scholar
Liberman, M.: Questioning reality, language Log, January 24. http://itre.cis.upenn.edu/~myl/languagelog/archives/001837.html (2005)
MacWhinney, B.: The CHILDES Project: Tools for Analyzing Talk, 3rd edn. Lawrence Erlbaum Associates, Mahwah (2000)
Google Scholar
Manning, C.D.: Probabilistic syntax. In: Bod et al. [14], pp. 289–341 (2003)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)
Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A., Taylor, A.: The Penn treebank 3. Linguistic Data Consortium, Catalog #LDC99T42 (1999)
Google Scholar
McEnery, T., Wilson, A.: Corpus Linguistics: An Introduction. Edinburgh University Press, Edinburgh (2001)
Google Scholar
McKinney, W.: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media, Sebastopol (2012)
Google Scholar
Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Team, The Google Books, Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)
Article Google Scholar
Monroe, B.L., Colaresi, M.P., Quinn, K.M.: Fightin’ words: lexical feature selection and evaluation for identifying the content of political conflict. Polit. Anal. 16(4), 372–403 (2009)
Article Google Scholar
Muchnik, L., Aral, S., Taylor, S.J.: Social influence bias: a randomized experiment. Science 341(6146), 647–651 (2013)
Article Google Scholar
Munro, R.: Processing short message communications in low-resource languages. PhD thesis, Stanford University, Stanford, CA (2012)
Google Scholar
Munro, R., Bethard, S., Kuperman, V., Lai, V.T., Melnick, R., Potts, C., Schnoebelen, T., Tily, H.: Crowdsourcing and language studies: the new generation of linguistic data. Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 122–130. ACL, Los Angeles (2010)
Google Scholar
Norvig, P.: Natural language corpus data. In: Segaran, T., Hammerbacher, J. (eds.) Beautiful Data, pp. 219–242. O’Reilly Media (2009)
Google Scholar
Norvig, P.: On Chomsky and the two cultures of statistical learning. http://norvig.com/chomsky.html, google, Inc (2011)
Odersky, M., Spoon, L., Venners, B.: Programming in Scala, 2nd edn. Artima, Walnut Creek (2010)
Google Scholar
Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 380–390. ACL, Atlanta, GA (2013)
Google Scholar
Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 115–124. ACL, Ann Arbor, MI (2005)
Google Scholar
Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1), 1–135 (2008)
Article Google Scholar
Pereira, F.C.N.: Formal grammar and information theory: together again? Philos. Trans. R. Soc. 358(1769), 1239–1253 (2000)
Article Google Scholar
Phillips, C.: Some arguments and nonarguments for reductionist accounts of syntactic phenomena. Lang. Cognit. Process. 28(1–2), 156–187 (2013)
Article Google Scholar
Potts, C.: On the negativity of negation. In: Li, N., Lutz, D. (eds.) Proceedings of Semantics and Linguistic Theory, vol. 20, pp. 636–659. CLC Publications, Ithaca, NY (2011)
Google Scholar
Potts, C.: Conventional implicature and expressive content. In: Maienborn, C., von Heusinger, K., Portner, P. (eds.) Semantics: An International Handbook of Natural Language Meaning, vol. 3, pp. 2516–2536. Mouton de Gruyter, Berlin (2012a)
Google Scholar
Potts, C.: Goal-driven answers in the cards dialogue corpus. In: Arnett, N., Bennett, R. (eds.) Proceedings of the 30th West Coast Conference on Formal Linguistics, pp. 1–20. Cascadilla Press, Somerville, MA (2012b)
Google Scholar
Potts, C., Schwarz, F.: Affective ‘this’. Linguist. Issues Lang. Technol. 3(5), 1–30 (2010)
Google Scholar
Putnam, H.: Mind, Language, and Reality: Philosophical Papers, vol. 2. Cambridge University Press, Cambridge (1975)
Book Google Scholar
Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)
Book Google Scholar
Recasens, M., de Marneffe, M.C., Potts, C.: The life and death of discourse entities: identifying singleton mentions. Human Language Technologies: The 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 627–633. ACL, Atlanta, Georgia (2013)
Google Scholar
Ring, N., Uitdenbogerd, A.L.: Finding ‘Lucy in disguise’: the misheard lyric matching problem. In: Lee, G.G., Song, D., Lin, C.Y., Aizawa, A., Kuriyama, K., Yoshioka, M., Sakai, T. (eds.) Information Retrieval Technology: 5th Asia Information Retrieval Symposium. Lecture Notes in Computer Science, vol. 5839, pp 157–167. Springer, Berlin (2009)
Google Scholar
Ritter, A., Clark, S., Mausam, Etzioni O.: Named entity recognition in tweets: An experimental study. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. ACL, Edinburgh (2011)
Google Scholar
Roark, B., Sproat, R.: Computational Approaches to Morphology and Syntax. Oxford University Press, Cambridge (2007)
Google Scholar
Sag, I.A., Wasow, T.: Performance-compatible competence grammar. In: Borsley, R., Börjar, K. (eds.) Non-Transformational Syntax: Formal and Explicit Models of Grammar, pp. 359–377. Wiley-Blackwell, Oxford (2011)
Chapter Google Scholar
Saurí, R.: A factuality profiler for eventualities in text. Ph.D. thesis, Computer Science Department, Brandeis University (2008)
Google Scholar
Saurí, R., Pustejovsky, J.: FactBank: a corpus annotated with event factuality. Lang. Resour. Eval. 43(3), 227–268 (2009)
Article Google Scholar
Scholz, B.C., Pelletier, F.J., Pullum, G.K.: Philosophy of linguistics. In: Zalta EN (ed) The Stanford Encyclopedia of Philosophy, winter 2011 edn, CSLI, Stanford, CA. http://plato.stanford.edu/archives/win2011/entries/linguistics/ (2011)
Schütze, C.T.: The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. University of Chicago Press, Chicago (1996)
Google Scholar
Schütze, C.T.: Web searches should supplement judgements, not supplant them. Zeitschrift für Sprachwissenschaft 28(1), 151–156 (2009)
Article Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423, 623–656 (1948)
Google Scholar
Simmons, J.P., Nelson, L.D., Simonsohn, U.: False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22(11), 1359–1366 (2013)
Article Google Scholar
Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast – but is it good? Evaluating non-expert annotations for natural language tasks. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 254–263. ACL, Honolulu, Hawaii (2008)
Google Scholar
Snyder, W.: An experimental investigation of syntactic satiation effects. Linguist. Inq. 31(3), 575–582 (2000)
Article Google Scholar
Spencer, N.J.: Differences between linguists and nonlinguists in intuitions of grammaticality-acceptability. J. Psycholinguist. Res. 2(2), 83–98 (1973)
Article Google Scholar
Spitkovsky, V.I., Jurafsky, D., Alshawi, H.: Profiting from mark-up: Hyper-text annotations for guided parsing. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1278–1287. ACL, Uppsala, Sweden (2010)
Google Scholar
Sproat, R., Shih, C.: The cross-linguistic distribution of adjective ordering restrictions. In: Georgopoulos C, Ishihara R (eds) Interdisciplinary Approaches to Language: Essays in Honor of S.-Y. Kuroda, pp. 565–59. Springer, Berlin (1991)
Google Scholar
Sprouse, J.: A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behav. Res. Methods 43(1), 155–167 (2010)
Article Google Scholar
Sprouse, J., Schütze, C.T., Almeida, D.: A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001–2010. Lingua 134, 219–248 (2013)
Article Google Scholar
Stoia, L., Shockley, D.M., Byron, D.K., Fosler-Lussier, E.: SCARE: A situated corpus with annotated referring expressions. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, European Language Resources Association, Marrakesh, Morroco (2008)
Google Scholar
Svartvik, J. (eds.) Directions in Corpus Linguistics: Proceedings of Nobel Symposium, vol. 82. Mouton de Gruyter, Berlin (1992)
Google Scholar
Thomas, M., Pang, B., Lee, L.: Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 327–335. ACL, Sydney, Australia (2006)
Google Scholar
Thompson, H.S., Anderson, A., Bard, E.G., Doherty-Sneddon, G., Newlands, A., Sotillo, C.: The HCRC map task corpus: Natural dialogue for speech recognition. HLT ’93: Proceedings of the workshop on Human Language Technology, pp. 25–30. ACL, Princeton (1993)
Chapter Google Scholar
Thuilier, J., Abeille, A., Crabbé, B.: Ordering preferences for postverbal complements in French. In: Tyne, H., André, V., Boulton, A., Benzitoun, C. (eds.) Ecological and Data-Driven Perspectives in French Language Studies. Cambridge Scholars Publishing, Cambridge (2013)
Google Scholar
Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics, vol. 1, pp. 173–180. ACL, Edmonton, Canada, NAACL ’03 (2003)
Google Scholar
Tufte, E.R.: The Visual Display of Quantitative Information, 2nd edn. Graphics Press, Cheshire (2001)
Google Scholar
Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)
Google Scholar
Vickers, J.: The problem of induction. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy, spring 2013 edn, CSLI. http://plato.stanford.edu/entries/induction-problem/ (2013)
Vitevitch, M.S.: Naturalistic and experimental analyses of word frequency and neighborhood density effects in slips of the ear. Lang. Speech 45(4), 407–434 (2002)
Article Google Scholar
Walker, M.A., Joshi, A.K., Prince, E.F. (eds.): Centering in Discourse. Oxford University Press, Oxford (1997)
Google Scholar
Wason, P.C., Reich, S.S.: A verbal illusion. Q. J. Exp. Psychol. 31(4), 591–597 (1979)
Article Google Scholar
Wierzbicka, A.: English Speech Act Verbs: A semantic dictionary. Academic Press, New York (1987)
Google Scholar
Winston, A.S., Blais, D.J.: What counts as an experiment? a transdisciplinary analysis of textbooks, 1930–1970. Am. J. Psychol. 109(4), 599–616 (1996)
Article Google Scholar
Wong, Y.W., Mooney, R.: Learning synchronous grammars for semantic parsing with lambda calculus. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pp. 960–967. ACL , Prague, Czech Republic (2007)
Google Scholar
Wu, F., Huberman, B.A.: How public opinion forms. In: Papadimitriou, C., Zhang, S. (eds.) Internet and Network Economics. Lecture Notes in Computer Science, vol. 5385, pp. 334–341. Springer, Berlin (2008)
Chapter Google Scholar
Zettlemoyer, L.S.: Learning to map sentences to logical form. Ph.D. thesis, MIT, Cambridge, MA (2009)
Google Scholar
Zipf, G.K.: Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge (1949)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Linguistics, The Ohio State University, Columbus, OH, USA
Marie-Catherine de Marneffe
Department of Linguistics, Stanford University, Stanford, CA, USA
Christopher Potts

Authors

Marie-Catherine de Marneffe
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Potts
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marie-Catherine de Marneffe .

Editor information

Editors and Affiliations

Department of Computer Science, Vassar College, Poughkeepsie, New York, USA
Nancy Ide
Department of Computer Science, Volen Center for Complex Systems, Brandeis University, Waltham, Massachusetts, USA
James Pustejovsky

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

de Marneffe, MC., Potts, C. (2017). Developing Linguistic Theories Using Annotated Corpora. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_16

Download citation

DOI: https://doi.org/10.1007/978-94-024-0881-2_16
Published: 17 June 2017
Publisher Name: Springer, Dordrecht
Print ISBN: 978-94-024-0879-9
Online ISBN: 978-94-024-0881-2
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics