Skip to main content

Developing Linguistic Theories Using Annotated Corpora

  • Chapter
  • First Online:
Handbook of Linguistic Annotation

Abstract

This paper aims to carve out a place for corpus research within theoretical linguistics and psycholinguistics. We argue that annotated corpora naturally complement native speaker intuitions and controlled psycholinguistic methods and thus can be powerful tools for developing and evaluating linguistic theories. We also review basic methods and best practices for moving from corpus annotations to hypothesis formation and testing, offering practical advice and technical guidance to researchers wishing to incorporate corpus methods into their work.

Our thanks to David Beaver, Philip Hofmeister, Nancy Ide, Dan Lassiter, Colin Phillips, and James Pustejovsky.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 349.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 449.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 449.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Winston and Blais suggest that the underlying causes of these differences are complex, relating to the practices of sub-disciplines within these fields, the role of causal inference in building theories, and perceived needs to be rigorous (biology and physics textbooks and lab manuals are much more likely not to address these methodological questions at all).

  2. 2.

    In general, one hopes that the speakers who contributed to the corpus were unconstrained by non-linguistic factors like editorial rules, censorship, and other performance limitations, but we can imagine studies where such factors actually serve the investigative goals.

  3. 3.

    For recent attempts to build tagging and parsing models that are better-suited to informal Web data, see [33, 113, 126].

  4. 4.

    http://java.com/.

  5. 5.

    http://www.python.org.

  6. 6.

    http://www.r-project.org.

  7. 7.

    For phonetic analysis, all these languages still lag behind Praat [15].

References

  1. Acton, E.K., Potts, C.: That straight talk: Sarah Palin and the sociolinguistics of demonstratives. J. Sociolinguist. 18(1), 3–31 (2014)

    Article  Google Scholar 

  2. Allen, J.F., Miller, B.W., Ringger, E.K., Sikorski, T.: A robust system for natural spoken dialogue. Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pp. 62–70. ACL, Santa Cruz, CA (1996)

    Chapter  Google Scholar 

  3. AnderBois, S., Brasoveanu, A., Henderson, R.: The pragmatics of quantifier scope: a corpus study. In: Aguilar-Guevara, A., Chernilovskaya, A., Nouwen, R. (eds.) Proceedings of Sinn und Bedeutung 16, MIT Linguistics, Cambridge, MA, MIT Working Papers in Linguistics, vol. 1, pp. 15–28 (2012)

    Google Scholar 

  4. Andor, J.: The master and his performance: an interview with Noam Chomsky. Intercult. Pragmat. 1(1), 93–111 (2004)

    Article  Google Scholar 

  5. Baayen, R.H.: Word Frequency Distributions. Kluwer Academic Publishers, Dordrecht (2001)

    Book  Google Scholar 

  6. Baayen, R.H.: Analyzing Linguistic Data: A Practical Introduction to Statistics. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  7. Barton, S.B., Sanford, A.J.: A case study of anomaly detection: shallow semantic processing and cohesion establishment. Mem. Cognit. 21(4), 477–487 (1993)

    Article  Google Scholar 

  8. Beaver, D.I.: The optimization of discourse anaphora. Linguist. Philos. 27(1), 3–56 (2004)

    Article  Google Scholar 

  9. Beaver, D.I.: Corpus pragmatics: Something old, something new, paper presented at the annual meeting of the Texas Linguistic Society (2007)

    Google Scholar 

  10. Beaver, D.I., Francez, I., Levinson, D.: Bad subject! (Non)-canonicality and NP distribution in existentials. In: Georgala, E., Howell, J. (eds.) Proceedings of Semantics and Linguistic Theory, vol. 15, pp. 19–43. CLC Publications, Ithaca, NY (2006)

    Google Scholar 

  11. Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media, Sebastopol (2009)

    Google Scholar 

  12. Blaylock, N., Allen, J.F.: Generating artificial corpora for plan recognition. In: Ardissono, L., Brna, P., Mitrovic, A. (eds.) User Modeling 2005. Lecture Notes in Artificial Intelligence, pp. 179–188. Springer, Berlin (2005)

    Google Scholar 

  13. Bock, K., Butterfield, S., Cutler, A., Cutting, J.C., Eberhard, K.M., Humphreys, K.R.: Number agreement in British and American English: disagreeing to agree collectively. Language 82(1), 64–113 (2006)

    Article  Google Scholar 

  14. Bod, R., Hay, J., Jannedy, S. (eds.): Probabilistic Linguistics. MIT Press, Cambridge (2003)

    Google Scholar 

  15. Boersma, P., Weenink, D.: Praat: Doing phonetics by computer. Computer program; Version 5.3.60. http://www.praat.org/ (2013)

  16. Bresnan, J., Nikitina, T.: The gradience of the dative alternation. In: Uyechi, L., Wee, L.H. (eds.) Reality Exploration and Discovery: Pattern Interaction in Language and Life, pp. 161–184. CSLI, Stanford (2010)

    Google Scholar 

  17. Burge, T.: Individualism and the mental. In: French, P., Uehling, T., Wettstein, H. (eds.) Midwest Studies in Philosophy. Studies in Metaphysics, vol. IV, pp. 73–121. University of Minnesota Press, Minneapolis (1979)

    Google Scholar 

  18. Callison-Burch, C.: Fast, cheap, and creative: evaluating translation quality using Amazon’s mechanical turk. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pp. 286–295. ACL, Singapore (2009)

    Google Scholar 

  19. Chen, Ch., Härdle, W.K., Unwin, A. (eds.): Handbook of Data Visualization. Springer, Berlin (2008)

    Google Scholar 

  20. Chomsky, N.: A review of B. F. Skinner’s verbal behavior. Language 35(1), 26–58 (1957)

    Article  Google Scholar 

  21. Chomsky, N.: Syntactic Structures. Mouton, The Hague (1957)

    Google Scholar 

  22. Chomsky, N.: Aspects of the Theory of Syntax. MIT Press, Cambridge (1965)

    Google Scholar 

  23. Chomsky, N.: Knowledge of Language. Praeger, New York (1986)

    Google Scholar 

  24. Clark, H.H.: Dogmas of understanding. Discourse Process. 23(3), 567–598 (1997)

    Article  Google Scholar 

  25. Clarke, A.D.F., Elsner, M., Rohde, H.: Where’s Wally: The influence of visual salience on referring expression generation. Front. Psychol. (Percept. Sci.) 4(1), 1–10 (2013)

    Google Scholar 

  26. Cleveland, W.S.: The Elements of Graphing Data. Hobart Press, Summit (1985)

    Google Scholar 

  27. Constant, N., Davis, C., Potts, C., Schwarz, F.: The pragmatics of expressive content: evidence from large corpora. Sprache und Datenverarbeitung 33(1–2), 5–21 (2009)

    Google Scholar 

  28. Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)

    Book  Google Scholar 

  29. Culbertson, J., Gross, S.: Are linguists better subjects? Br. J. Philos. Sci. 60(4), 721–736 (2009)

    Article  Google Scholar 

  30. de Marneffe, M.C., Rafferty, A.N., Manning, C.D.: Finding contradictions in text. Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, pp. 1039–1047. ACL, Columbus, OH (2008)

    Google Scholar 

  31. de Marneffe, M.C., Manning, C.D., Potts, C.: “Was it good? It was provocative.” Learning the meaning of scalar adjectives. In: Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 167–176. ACL, Uppsala, Sweden (2010)

    Google Scholar 

  32. de Marneffe, M.C., Manning, C.D., Potts, C.: Did it happen? The pragmatic complexity of veridicality assessment. Comput. Linguist. 38(2), 301–333 (2012)

    Article  Google Scholar 

  33. de Marneffe, M.C., Connor, M., Silveira, N., Bowman, S.R., Dozat, T., Manning, C.D.: More constructions, more genres: extending stanford dependencies. In: Hajičová, E., Gerdes, K., Wanner, L. (eds.) Proceedings of the Second International Conference on Dependency Linguistics, pp. 187–196. ACL, Prague (2013)

    Google Scholar 

  34. Degen, J.: A corpus-based study of Some (but not All) implicatures, ms., University of Rochester (2013)

    Google Scholar 

  35. Devitt, M.: Intuitions in linguistics. Br. J. Philos. Sci. 57(3), 481–513 (2006)

    Article  Google Scholar 

  36. Dewey, G.: Relative Frequency of English Speech Sounds. Harvard University Press, Cambridge (1923)

    Book  Google Scholar 

  37. Díaz-Negrillo, A., Fernández-Domínguez, J.: Error tagging systems for learner corpora. Revista Espanola de Linguistica Aplicada 19, 83–102 (2006)

    Google Scholar 

  38. Domingos, P.: A few useful things to know about machine learning. Commun. ACM 55(10), 78–87 (2012)

    Article  Google Scholar 

  39. Duan, M., Elsner, M., de Marneffe, M.C.: Visual and linguistic predictors for the definiteness of referring expressions. In: Proceedings of the 17th Workshop on the Semantics and Pragmatics of Dialogue, pp. 25–34 (2013)

    Google Scholar 

  40. Erlewine, M.Y.: The Constituency of Hyperlinks in a Hypertext Corpus, ms., MIT (2011)

    Google Scholar 

  41. Faye, J.: Copenhagen interpretation of quantum mechanics. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy, fall 2008 edition edn, CSLI. http://plato.stanford.edu/archives/fall2008/entries/qm-copenhagen/ (2008)

  42. Fillmore, C.J.: “Corpus linguistics” or “computer-aided armchair linguistics”. In: Svartvik [144], pp. 35–66 (1992)

    Google Scholar 

  43. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 363–370. ACL, Ann Arbor, MI (2005)

    Google Scholar 

  44. Francis, W.N., Kučera, H.: Manual of information to accompany a ‘standard sample of present-day edited American English, for use with digital computers’, Technical report. Brown University, Providence, RI (1979)

    Google Scholar 

  45. Francis, W.N., Kučera, H.: A standard sample of present-day English for use with digital computers. Report to the U. S. Office of Education on Cooperative Research Project E-007, Brown University, Providence, RI (1964)

    Google Scholar 

  46. Frank, A.F., Jaeger, T.F.: Speaking rationally: uniform information density as an optimal strategy for language production. In: Proceedings of the Cognitive Science Society, Washington, D.C., pp. 939–944 (2008)

    Google Scholar 

  47. Frazier, L.: Co-reference and adult language comprehension. Rev. Linguist. 8(2), 1–11 (2012)

    Google Scholar 

  48. Friedl, J.E.F.: Mastering Regular Expressions, 3rd edn. O’Reilly Media, Sebastopol (2006)

    Google Scholar 

  49. Gelman, A.: Review essay: causality and statistical learning. Am. J. Sociol. 117(3), 955–966 (2011)

    Article  Google Scholar 

  50. Gelman, A., Stern, H.S.: The difference between “significant” and “not significant” is not itself statistically significant. Am. Stat. 60(4), 328–331 (2006)

    Article  Google Scholar 

  51. Glass, L.: What does it mean for an implicit object to be recoverable? In: Proceedings of the Penn Linguistics Colloquium, Penn Linguistics Club, Philadelphia, PA (2013)

    Google Scholar 

  52. Godfrey, J.J., Holliman, E.: Switchboard-1 release 2. Linguistic Data Consortium, Catalog #LDC97S62 (1997)

    Google Scholar 

  53. Goldsmith, J.: Unsupervised learning of the morphology of a natural language. Comput. Linguist. 27(2), 153–198 (2001)

    Article  Google Scholar 

  54. Goldwater, S., Johnson, M.: Learning OT constraint rankings using a maximum entropy model. In: Spenader, J., Eriksson, A., Dahl, Ö. (eds.) Proceedings of the Stockholm Workshop on Variation within Optimality Theory, pp. 111–120. Stockholm University, Stockholm (2003)

    Google Scholar 

  55. Goldwater, S., Griffiths, T.L., Johnson, M.: Contextual dependencies in unsupervised word segmentation. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 673–680. ACL, Sydney, Australia (2006)

    Google Scholar 

  56. Goodman, N.D., Lassiter, D.: Probabilistic semantics and pragmatics: uncertainty in language and thought. In: Lappin, S., Fox, C. (eds.) The Handbook of Contemporary Semantic Theory, 2nd edn. Wiley-Blackwell, Oxford (2015)

    Google Scholar 

  57. Gordon, P.C., Hendrick, R.: Intuitive knowledge of linguistic co-reference. Cognition 3(3), 325–370 (1997)

    Article  Google Scholar 

  58. Gordon, P.C., Grosz, B.J., Gilliom, L.A.: Pronouns, names and the centering of attention in discourse. Cognit. Sci. 17(3), 311–348 (1993)

    Article  Google Scholar 

  59. Gries, S.T.: Null-hypothesis significance testing of word frequencies: a follow-up on Kilgarriff. Corpus Linguist. Linguist. Theory 1(2), 277–294 (2005)

    Article  Google Scholar 

  60. Gries, S.T.: Quantitative Corpus Linguistics with R: A Practical Introduction. Routledge, London (2009)

    Book  Google Scholar 

  61. Grimm, S., McNally, L.: No ordered arguments needed for nouns. In: Aloni, M., Franke, M., Roelofsen, F. (eds.) Proceedings of the 19th Amsterdam Colloquium, pp. 123–130. ILLC, Amsterdam (2013)

    Google Scholar 

  62. Hacquard, V., Wellwood, A.: Embedding epistemic modals in English: a corpus-based study. Semant. Pragmat. 5(4), 1–29 (2012)

    Google Scholar 

  63. Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intell. Syst. 24(2), 8–12 (2009)

    Article  Google Scholar 

  64. Harris, J.A., Potts, C.: Perspective-shifting with appositives and expressives. Linguist. Philos. 32(6), 523–552 (2009)

    Article  Google Scholar 

  65. Harris, R.A.: The Linguistic Wars. Oxford University Press, Oxford (1993)

    Google Scholar 

  66. Harris, Z.: Distributional structure. Word 10(23), 146–162 (1954)

    Article  Google Scholar 

  67. Hartshorne, J.K., Bonial, C., Palmer, M.: The VerbCorner project: toward an empirically-based semantic decomposition of verbs. Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pp. 1438–1442. Association for Computational Linguistics, Seattle (2013)

    Google Scholar 

  68. Hayes, B., Wilson, C.: A maximum entropy model of phonotactics and phonotactic learning. Linguist. Inq. 39(3), 379–440 (2008)

    Article  Google Scholar 

  69. Heer, J., Bostock, M.: Crowdsourcing graphical perception: using mechanical turk to assess visualization design. In: ACM Human Factors in Computing Systems, pp. 203–212 (2010)

    Google Scholar 

  70. Higgins, D., Sadock, J.M.: A machine learning approach to modeling scope preferences. Comput. Linguist. 29(1), 73–96 (2003)

    Article  Google Scholar 

  71. Hockett, C.F.: A note on ‘structure’ [review of de Goeje by W. D. Preston]. Int. J. Am. Linguist. 14(4), 269–271 (1948)

    Article  Google Scholar 

  72. Hockett, C.F.: Two models of grammatical description. Word 10(2), 210–234 (1954)

    Article  Google Scholar 

  73. Hoeksema, J.: Corpus study of negative polarity items. University of Groningen. http://www.let.rug.nl/hoeksema/docs/barcelona.html (1997)

  74. Hoeksema, J.: There is no number effect in the licensing of negative polarity items: a reply to Guerzoni and Sharvit. Linguist. Philos. 31(4), 397–407 (2008)

    Article  Google Scholar 

  75. Horn, L.R.: Duplex negatio affirmat...: the economy of double negation. In: Dobrin, L.M., Nichols, L., Rodriguez, R.M. (eds) Papers from the 27th Regional Meeting of the Chicago Linguistic Society, Chicago Linguistic Society, Chicago, vol 2: The Parasession on Negation, pp. 80–106 (1991)

    Google Scholar 

  76. Hsueh, P.Y., Melville, P., Sindhwani, V.: Data quality from crowdsourcing: a study of annotation selection criteria. Proceedings of the NAACL HLT 2009 Workshop on Active Learning for Natural Language Processing, pp. 27–35. ACL, Boulder, CO (2009)

    Google Scholar 

  77. Jackendoff, R.S.: Languages of the Mind. MIT Press, Cambridge (1992)

    Google Scholar 

  78. Jurafsky, D.: A probabilistic model of lexical and syntactic access and disambiguation. Cognit. Sci. 20(2), 137–194 (1996)

    Article  Google Scholar 

  79. Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, 2nd edn. Prentice-Hall, Englewood Cliffs (2009)

    Google Scholar 

  80. Katz, J.J.: Language and Other Abstract Objects. Rowman and Littlefield, Totowa (1981)

    Google Scholar 

  81. Katz, J.J., Postal, P.M.: Realism vs. conceptualism in linguistics. Linguist. Philos. 14(5), 515–554 (1991)

    Article  Google Scholar 

  82. Kilgarriff, A.: Language is never, ever, ever, random. Corpus Linguist. Linguist. Theory 1(2), 263–276 (2005)

    Article  Google Scholar 

  83. Kilgarriff, A.: Googleology is bad science. Comput. Linguist. 33(1), 147–151 (2007)

    Article  Google Scholar 

  84. Kilgarriff, A.: Getting to know your corpus. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) Text, Speech and Dialogue: 15th International Conference. Lecture Notes in Artificial Intelligence, vol. 7499, pp. 3–15. Springer, Berlin (2012)

    Google Scholar 

  85. Kilgarriff, A., Grefenstette, G.: Introduction to the special issue on the Web as corpus. Comput. Linguist. 29(3), 333–347 (2003)

    Article  Google Scholar 

  86. Klein, D., Manning, C.D.: Accurate unlexicalized parsing. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, ACL, Sapporo, Japan, vol. 1, pp. 423–430 (2003)

    Google Scholar 

  87. Kučera, H., Francis, W.N.: Computational Analysis of Present-Day American English. Brown University Press, Providence (1967)

    Google Scholar 

  88. Kwiatkowski, T., Zettlemoyer, L.S., Goldwater, S., Steedman, M.: Lexical generalization in CCG grammar induction for semantic parsing. Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1512–1523. ACL, Edinburgh (2011)

    Google Scholar 

  89. Lassiter, D.: Semantic externalism, language variation, and sociolinguistic accommodation. Mind Lang. 23(5), 607–633 (2008)

    Article  Google Scholar 

  90. Lee, H., Peirsman, Y., Chang, A., Chambers, N., Surdeanu, M., Jurafsky, D.: Stanford’s multi-pass sieve coreference resolution system at the CoNLL-2011 shared task. Proceedings of the 15th Conference on Computational Natural Language Learning: Shared Task, pp. 28–34. ACL, Portland, OR (2011)

    Google Scholar 

  91. Leech, G.N.: Corpora and theories of linguistic performance. In: Svartvik [144], pp. 105–122 (1992)

    Google Scholar 

  92. Levin, B.: English Verb Classes and Alternations: A Preliminary Investigation. Chicago University Press, Chicago (1993)

    Google Scholar 

  93. Levy, R.: Expectation-based syntactic comprehension. Cognition 106(3), 1126–1177 (2008)

    Article  Google Scholar 

  94. Levy, R., Andrew, G.: Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In: Proceedings of the 5th Edition of the International Conference on Language Resources and Evaluation, pp. 2231–2234 (2006)

    Google Scholar 

  95. Levy, R., Jaeger, T.F.: Speakers optimize information density through syntactic reduction. In: Schölkopf, B., Platt, J., Hoffman, T. (eds.) Advances in Neural Information Processing Systems, vol. 19, pp. 849–856. MIT Press, Cambridge (2007)

    Google Scholar 

  96. Lewis, D.: Convention. Harvard University Press, Cambridge, MA, reprinted 2002 by Blackwell (1969)

    Google Scholar 

  97. Liang, P., Jordan, M.I., Klein, D.: Learning dependency-based compositional semantics. Comput. Linguist. 39(2), 389–446 (2013)

    Article  Google Scholar 

  98. Liberman, M.: Questioning reality, language Log, January 24. http://itre.cis.upenn.edu/~myl/languagelog/archives/001837.html (2005)

  99. MacWhinney, B.: The CHILDES Project: Tools for Analyzing Talk, 3rd edn. Lawrence Erlbaum Associates, Mahwah (2000)

    Google Scholar 

  100. Manning, C.D.: Probabilistic syntax. In: Bod et al. [14], pp. 289–341 (2003)

    Google Scholar 

  101. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. MIT Press, Cambridge (1999)

    Google Scholar 

  102. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A., Taylor, A.: The Penn treebank 3. Linguistic Data Consortium, Catalog #LDC99T42 (1999)

    Google Scholar 

  103. McEnery, T., Wilson, A.: Corpus Linguistics: An Introduction. Edinburgh University Press, Edinburgh (2001)

    Google Scholar 

  104. McKinney, W.: Python for Data Analysis: Data Wrangling with Pandas, NumPy, and IPython. O’Reilly Media, Sebastopol (2012)

    Google Scholar 

  105. Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Team, The Google Books, Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., Pinker, S., Nowak, M.A., Aiden, E.L.: Quantitative analysis of culture using millions of digitized books. Science 331(6014), 176–182 (2011)

    Article  Google Scholar 

  106. Monroe, B.L., Colaresi, M.P., Quinn, K.M.: Fightin’ words: lexical feature selection and evaluation for identifying the content of political conflict. Polit. Anal. 16(4), 372–403 (2009)

    Article  Google Scholar 

  107. Muchnik, L., Aral, S., Taylor, S.J.: Social influence bias: a randomized experiment. Science 341(6146), 647–651 (2013)

    Article  Google Scholar 

  108. Munro, R.: Processing short message communications in low-resource languages. PhD thesis, Stanford University, Stanford, CA (2012)

    Google Scholar 

  109. Munro, R., Bethard, S., Kuperman, V., Lai, V.T., Melnick, R., Potts, C., Schnoebelen, T., Tily, H.: Crowdsourcing and language studies: the new generation of linguistic data. Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk, pp. 122–130. ACL, Los Angeles (2010)

    Google Scholar 

  110. Norvig, P.: Natural language corpus data. In: Segaran, T., Hammerbacher, J. (eds.) Beautiful Data, pp. 219–242. O’Reilly Media (2009)

    Google Scholar 

  111. Norvig, P.: On Chomsky and the two cultures of statistical learning. http://norvig.com/chomsky.html, google, Inc (2011)

  112. Odersky, M., Spoon, L., Venners, B.: Programming in Scala, 2nd edn. Artima, Walnut Creek (2010)

    Google Scholar 

  113. Owoputi, O., O’Connor, B., Dyer, C., Gimpel, K., Schneider, N., Smith, N.A.: Improved part-of-speech tagging for online conversational text with word clusters. Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 380–390. ACL, Atlanta, GA (2013)

    Google Scholar 

  114. Pang, B., Lee, L.: Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales. Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics, pp. 115–124. ACL, Ann Arbor, MI (2005)

    Google Scholar 

  115. Pang, B., Lee, L.: Opinion mining and sentiment analysis. Found. Trends Inf. Retr. 2(1), 1–135 (2008)

    Article  Google Scholar 

  116. Pereira, F.C.N.: Formal grammar and information theory: together again? Philos. Trans. R. Soc. 358(1769), 1239–1253 (2000)

    Article  Google Scholar 

  117. Phillips, C.: Some arguments and nonarguments for reductionist accounts of syntactic phenomena. Lang. Cognit. Process. 28(1–2), 156–187 (2013)

    Article  Google Scholar 

  118. Potts, C.: On the negativity of negation. In: Li, N., Lutz, D. (eds.) Proceedings of Semantics and Linguistic Theory, vol. 20, pp. 636–659. CLC Publications, Ithaca, NY (2011)

    Google Scholar 

  119. Potts, C.: Conventional implicature and expressive content. In: Maienborn, C., von Heusinger, K., Portner, P. (eds.) Semantics: An International Handbook of Natural Language Meaning, vol. 3, pp. 2516–2536. Mouton de Gruyter, Berlin (2012a)

    Google Scholar 

  120. Potts, C.: Goal-driven answers in the cards dialogue corpus. In: Arnett, N., Bennett, R. (eds.) Proceedings of the 30th West Coast Conference on Formal Linguistics, pp. 1–20. Cascadilla Press, Somerville, MA (2012b)

    Google Scholar 

  121. Potts, C., Schwarz, F.: Affective ‘this’. Linguist. Issues Lang. Technol. 3(5), 1–30 (2010)

    Google Scholar 

  122. Putnam, H.: Mind, Language, and Reality: Philosophical Papers, vol. 2. Cambridge University Press, Cambridge (1975)

    Book  Google Scholar 

  123. Rajaraman, A., Ullman, J.D.: Mining of Massive Datasets. Cambridge University Press, Cambridge (2011)

    Book  Google Scholar 

  124. Recasens, M., de Marneffe, M.C., Potts, C.: The life and death of discourse entities: identifying singleton mentions. Human Language Technologies: The 2013 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 627–633. ACL, Atlanta, Georgia (2013)

    Google Scholar 

  125. Ring, N., Uitdenbogerd, A.L.: Finding ‘Lucy in disguise’: the misheard lyric matching problem. In: Lee, G.G., Song, D., Lin, C.Y., Aizawa, A., Kuriyama, K., Yoshioka, M., Sakai, T. (eds.) Information Retrieval Technology: 5th Asia Information Retrieval Symposium. Lecture Notes in Computer Science, vol. 5839, pp 157–167. Springer, Berlin (2009)

    Google Scholar 

  126. Ritter, A., Clark, S., Mausam, Etzioni O.: Named entity recognition in tweets: An experimental study. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing, pp. 1524–1534. ACL, Edinburgh (2011)

    Google Scholar 

  127. Roark, B., Sproat, R.: Computational Approaches to Morphology and Syntax. Oxford University Press, Cambridge (2007)

    Google Scholar 

  128. Sag, I.A., Wasow, T.: Performance-compatible competence grammar. In: Borsley, R., Börjar, K. (eds.) Non-Transformational Syntax: Formal and Explicit Models of Grammar, pp. 359–377. Wiley-Blackwell, Oxford (2011)

    Chapter  Google Scholar 

  129. Saurí, R.: A factuality profiler for eventualities in text. Ph.D. thesis, Computer Science Department, Brandeis University (2008)

    Google Scholar 

  130. Saurí, R., Pustejovsky, J.: FactBank: a corpus annotated with event factuality. Lang. Resour. Eval. 43(3), 227–268 (2009)

    Article  Google Scholar 

  131. Scholz, B.C., Pelletier, F.J., Pullum, G.K.: Philosophy of linguistics. In: Zalta EN (ed) The Stanford Encyclopedia of Philosophy, winter 2011 edn, CSLI, Stanford, CA. http://plato.stanford.edu/archives/win2011/entries/linguistics/ (2011)

  132. Schütze, C.T.: The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology. University of Chicago Press, Chicago (1996)

    Google Scholar 

  133. Schütze, C.T.: Web searches should supplement judgements, not supplant them. Zeitschrift für Sprachwissenschaft 28(1), 151–156 (2009)

    Article  Google Scholar 

  134. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423, 623–656 (1948)

    Google Scholar 

  135. Simmons, J.P., Nelson, L.D., Simonsohn, U.: False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22(11), 1359–1366 (2013)

    Article  Google Scholar 

  136. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast – but is it good? Evaluating non-expert annotations for natural language tasks. Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing, pp. 254–263. ACL, Honolulu, Hawaii (2008)

    Google Scholar 

  137. Snyder, W.: An experimental investigation of syntactic satiation effects. Linguist. Inq. 31(3), 575–582 (2000)

    Article  Google Scholar 

  138. Spencer, N.J.: Differences between linguists and nonlinguists in intuitions of grammaticality-acceptability. J. Psycholinguist. Res. 2(2), 83–98 (1973)

    Article  Google Scholar 

  139. Spitkovsky, V.I., Jurafsky, D., Alshawi, H.: Profiting from mark-up: Hyper-text annotations for guided parsing. Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pp. 1278–1287. ACL, Uppsala, Sweden (2010)

    Google Scholar 

  140. Sproat, R., Shih, C.: The cross-linguistic distribution of adjective ordering restrictions. In: Georgopoulos C, Ishihara R (eds) Interdisciplinary Approaches to Language: Essays in Honor of S.-Y. Kuroda, pp. 565–59. Springer, Berlin (1991)

    Google Scholar 

  141. Sprouse, J.: A validation of Amazon Mechanical Turk for the collection of acceptability judgments in linguistic theory. Behav. Res. Methods 43(1), 155–167 (2010)

    Article  Google Scholar 

  142. Sprouse, J., Schütze, C.T., Almeida, D.: A comparison of informal and formal acceptability judgments using a random sample from Linguistic Inquiry 2001–2010. Lingua 134, 219–248 (2013)

    Article  Google Scholar 

  143. Stoia, L., Shockley, D.M., Byron, D.K., Fosler-Lussier, E.: SCARE: A situated corpus with annotated referring expressions. In: Proceedings of the 6th International Conference on Language Resources and Evaluation, European Language Resources Association, Marrakesh, Morroco (2008)

    Google Scholar 

  144. Svartvik, J. (eds.) Directions in Corpus Linguistics: Proceedings of Nobel Symposium, vol. 82. Mouton de Gruyter, Berlin (1992)

    Google Scholar 

  145. Thomas, M., Pang, B., Lee, L.: Get out the vote: Determining support or opposition from Congressional floor-debate transcripts. Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 327–335. ACL, Sydney, Australia (2006)

    Google Scholar 

  146. Thompson, H.S., Anderson, A., Bard, E.G., Doherty-Sneddon, G., Newlands, A., Sotillo, C.: The HCRC map task corpus: Natural dialogue for speech recognition. HLT ’93: Proceedings of the workshop on Human Language Technology, pp. 25–30. ACL, Princeton (1993)

    Chapter  Google Scholar 

  147. Thuilier, J., Abeille, A., Crabbé, B.: Ordering preferences for postverbal complements in French. In: Tyne, H., André, V., Boulton, A., Benzitoun, C. (eds.) Ecological and Data-Driven Perspectives in French Language Studies. Cambridge Scholars Publishing, Cambridge (2013)

    Google Scholar 

  148. Toutanova, K., Klein, D., Manning, C.D., Singer, Y.: Feature-rich part-of-speech tagging with a cyclic dependency network. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics, vol. 1, pp. 173–180. ACL, Edmonton, Canada, NAACL ’03 (2003)

    Google Scholar 

  149. Tufte, E.R.: The Visual Display of Quantitative Information, 2nd edn. Graphics Press, Cheshire (2001)

    Google Scholar 

  150. Turney, P.D., Pantel, P.: From frequency to meaning: vector space models of semantics. J. Artif. Intell. Res. 37, 141–188 (2010)

    Google Scholar 

  151. Vickers, J.: The problem of induction. In: Zalta, E.N. (ed.) The Stanford Encyclopedia of Philosophy, spring 2013 edn, CSLI. http://plato.stanford.edu/entries/induction-problem/ (2013)

  152. Vitevitch, M.S.: Naturalistic and experimental analyses of word frequency and neighborhood density effects in slips of the ear. Lang. Speech 45(4), 407–434 (2002)

    Article  Google Scholar 

  153. Walker, M.A., Joshi, A.K., Prince, E.F. (eds.): Centering in Discourse. Oxford University Press, Oxford (1997)

    Google Scholar 

  154. Wason, P.C., Reich, S.S.: A verbal illusion. Q. J. Exp. Psychol. 31(4), 591–597 (1979)

    Article  Google Scholar 

  155. Wierzbicka, A.: English Speech Act Verbs: A semantic dictionary. Academic Press, New York (1987)

    Google Scholar 

  156. Winston, A.S., Blais, D.J.: What counts as an experiment? a transdisciplinary analysis of textbooks, 1930–1970. Am. J. Psychol. 109(4), 599–616 (1996)

    Article  Google Scholar 

  157. Wong, Y.W., Mooney, R.: Learning synchronous grammars for semantic parsing with lambda calculus. Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pp. 960–967. ACL , Prague, Czech Republic (2007)

    Google Scholar 

  158. Wu, F., Huberman, B.A.: How public opinion forms. In: Papadimitriou, C., Zhang, S. (eds.) Internet and Network Economics. Lecture Notes in Computer Science, vol. 5385, pp. 334–341. Springer, Berlin (2008)

    Chapter  Google Scholar 

  159. Zettlemoyer, L.S.: Learning to map sentences to logical form. Ph.D. thesis, MIT, Cambridge, MA (2009)

    Google Scholar 

  160. Zipf, G.K.: Human Behavior and the Principle of Least Effort. Addison-Wesley, Cambridge (1949)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marie-Catherine de Marneffe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

de Marneffe, MC., Potts, C. (2017). Developing Linguistic Theories Using Annotated Corpora. In: Ide, N., Pustejovsky, J. (eds) Handbook of Linguistic Annotation. Springer, Dordrecht. https://doi.org/10.1007/978-94-024-0881-2_16

Download citation

  • DOI: https://doi.org/10.1007/978-94-024-0881-2_16

  • Published:

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-94-024-0879-9

  • Online ISBN: 978-94-024-0881-2

  • eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics