NADA: A Robust System for Non-referential Pronoun Detection

  • Shane Bergsma
  • David Yarowsky
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7099)


We present \(\textsc{Nada}\): the Non-Anaphoric Detection Algorithm. \(\textsc{Nada}\) is a novel, publicly-available program that accurately distinguishes between the referential and non-referential pronoun it in raw English text. Like recent state-of-the-art approaches, \(\textsc{Nada}\) uses very large-scale web \(\mbox{N-gram}\) features, but \(\textsc{Nada}\) makes these features practical by compressing the \(\mbox{N-gram}\) counts so they can fit into computer memory. \(\textsc{Nada}\) therefore operates as a fast, stand-alone system. \(\textsc{Nada}\) also improves over previous web-scale systems by considering the entire sentence, rather than narrow context windows, via long-distance lexical features. \(\textsc{Nada}\) very substantially outperforms other state-of-the-art systems in non-referential detection accuracy.


non-referential pronoun identification pleonastic pronoun non-referential pronoun non-anaphoric pronoun dummy pronoun expletive pronoun pronoun resolution anaphoricity coreference resolution anaphoric referential nominal pronoun lexical disambiguation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bergsma, S., Lin, D., Goebel, R.: Distributional identification of non-referential pronouns. In: ACL 2008: HLT, pp. 10–18 (2008)Google Scholar
  2. 2.
    Bergsma, S., Lin, D., Goebel, R.: Web-scale N-gram models for lexical disambiguation. In: IJCAI, pp. 1507–1512 (2009)Google Scholar
  3. 3.
    Bergsma, S., Pitler, E., Lin, D.: Creating robust supervised classifiers via web-scale N-gram data. In: ACL, pp. 865–874 (2010)Google Scholar
  4. 4.
    Boyd, A., Gegg-Harrison, W., Byron, D.: Identifying non-referential it: A machine learning approach incorporating linguistically motivated patterns. In: ACL Workshop on Feature Engineering for Machine Learning in Natural Language Processing, pp. 40–47 (2005)Google Scholar
  5. 5.
    Brants, T., Alex Franz, A.: The Google Web 1T 5-gram Corpus Version 1.1. LDC2006T13 (2006)Google Scholar
  6. 6.
    Byron, D.: Resolving pronominal reference to abstract entities. In: ACL, pp. 80–87 (2002)Google Scholar
  7. 7.
    Charniak, E., Elsner, M.: EM works for pronoun anaphora resolution. In: EACL, pp. 148–156 (2009)Google Scholar
  8. 8.
    Cherry, C., Bergsma, S.: An Expectation Maximization approach to pronoun resolution. In: CoNLL, pp. 88–95 (2005)Google Scholar
  9. 9.
    Church, K., Hart, T., Gao, J.: Compressing trigram language models with Golomb coding. In: EMNLP-CoNLL, pp. 199–207 (2007)Google Scholar
  10. 10.
    Danlos, L.: Automatic recognition of French expletive pronoun occurrences. In: IJCNLP, pp. 73–78 (2005)Google Scholar
  11. 11.
    Denis, P., Baldridge, J.: Joint determination of anaphoricity and coreference using integer programming. In: NAACL-HLT, pp. 236–243 (2007)Google Scholar
  12. 12.
    Evans, R.: Applying machine learning toward an automatic classification of it. Literary and Linguistic Computing 16(1), 45–57 (2001)CrossRefGoogle Scholar
  13. 13.
    Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A library for large linear classification. Mach. Learn. Res. 9, 1871–1874 (2008)zbMATHGoogle Scholar
  14. 14.
    Ge, N., Hale, J., Charniak, E.: A statistical approach to anaphora resolution. In: Proceedings of the Sixth Workshop on Very Large Corpora, pp. 161–170 (1998)Google Scholar
  15. 15.
    Guthrie, D., Hepple, M.: Storing the web in memory: Space efficient language models with constant time retrieval. In: EMNLP, pp. 262–272 (2010)Google Scholar
  16. 16.
    Hammami, S.M., Sallemi, R., Belguith, L.H.: A bayesian classifier for the identification of non-referential pronouns in Arabic. In: INFOS, Special Track On Natural Language Processing and Knowledge Mining (2010)Google Scholar
  17. 17.
    Hirst, G.: Anaphora in Natural Language Understanding: A Survey. Springer, Heidelberg (1981)CrossRefGoogle Scholar
  18. 18.
    Hobbs, J.: Resolving pronoun references. Lingua 44(311), 339–352 (1978)Google Scholar
  19. 19.
    Kehler, A., Appelt, D., Taylor, L., Simma, A.: The (non)utility of predicate-argument frequencies for pronoun interpretation. In: HLT-NAACL, pp. 289–296 (2004)Google Scholar
  20. 20.
    Koehn, P.: Europarl: A parallel corpus for statistical machine translation. In: MT Summit X, pp. 79–86 (2005)Google Scholar
  21. 21.
    Lappin, S., Leass, H.J.: An algorithm for pronominal anaphora resolution. Computational Linguistics 20(4), 535–561 (1994)Google Scholar
  22. 22.
    Marcus, M.P., Santorini, B., Marcinkiewicz, M.: Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics 19(2), 313–330 (1993)Google Scholar
  23. 23.
    Miltsakaki, E.: Antelogue: Pronoun resolution for text and dialogue. In: Coling 2010: Demonstrations, pp. 41–44 (2010)Google Scholar
  24. 24.
    Mitkov, R., Evans, R., Orasan, C.: A new, fully automatic version of Mitkov’s knowledge-poor pronoun resolution method. In: CICLing, pp. 168–186 (2002)Google Scholar
  25. 25.
    Müller, C.: Automatic detection of nonreferential It in spoken multi-party dialog. In: EACL, pp. 49–56 (2006)Google Scholar
  26. 26.
    Ng, V., Cardie, C.: Identifying anaphoric and non-anaphoric noun phrases to improve coreference resolution. In: COLING, pp. 730–736 (2002)Google Scholar
  27. 27.
    Paice, C.D., Husk, G.D.: Towards the automatic recognition of anaphoric features in English text: the impersonal pronoun ”it”. Computer Speech and Language 2, 109–132 (1987)CrossRefGoogle Scholar
  28. 28.
    Pauls, A., Klein, D.: Faster and smaller N-Gram language models. In: ACL, pp. 258–267 (2011)Google Scholar
  29. 29.
    Rello, L., Suárez, P., Mitkov, R.: A machine learning method for identifying impersonal constructions and zero pronouns in Spanish. In: Procesamiento del Lenguaje Natural, pp. 281–287 (2010)Google Scholar
  30. 30.
    Stoyanov, V., Gilbert, N., Cardie, C., Riloff, E.: Conundrums in noun phrase coreference resolution: making sense of the state-of-the-art. In: ACL-IJCNLP, pp. 656–664 (2009)Google Scholar
  31. 31.
    Webber, B.L.: Discourse deixis: reference to discourse segments. In: ACL, pp. 113–122 (1988)Google Scholar
  32. 32.
    Weischedel, R., Brunstein, A.: BBN pronoun coreference and entity type corpus. LDC2005T33 (2005)Google Scholar
  33. 33.
    Yang, X., Jian Su, J., Tan, C.L.: Improving pronoun resolution using statistics-based semantic compatibility information. In: ACL (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Shane Bergsma
    • 1
  • David Yarowsky
    • 1
  1. 1.Dept. of Computer Science and Human Language Technology Center of ExcellenceJohns Hopkins UniversityUS

Personalised recommendations