Skip to main content

Some of Our Best Friends Are Statisticians

  • Conference paper
Text, Speech and Dialogue (TSD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4629))

Included in the following conference series:

Abstract

In his LREC 2004 invited talk when awarded by the first ever Antonio Zampolli prize for his essential contributions to the use of spoken and written language resources, Frederick Jelinek has used the title “Some of My Best Friends Are Linguists”. He did so for many reasons, one of them being that he wanted to remove the perception that he dislikes linguists and linguistics after so many people used to cite his famous line from an old presentation at a Natural Language Processing Evaluation workshop in 1988, in which he said “Whenever I fire a linguist our system performance improves.”

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bahl, L.R., Mercer, R.L.: Part-of-speech assignment by a statistical decision algorithm. In: Proceedings of the IEEE International Symposium on Information Theory, pp. 88–89. IEEE Computer Society Press, Los Alamitos (1976)

    Google Scholar 

  2. Banko, M., Brill, E.: Scaling to Very Very Large Corpora for Natural Language Disambiguation. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. Toulouse, France (2001)

    Google Scholar 

  3. Banko, M., Brill, E.: Mitigating the Paucity-of-Data Problem: Exploring the Effect of Training Corpus Size on Classifier Performance for Natural Language Processing. In: Proceedings of the First International Conference on Human Language Technology. San Diego, California, pp. 1–5 (2001)

    Google Scholar 

  4. Berger, A.L., Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Gillett, J.R., Lafferty, J.D., Mercer, R.L., Printz, H., Ureš, L.: The Candide System for Machine Translation. In: Proceedings of the ARPA Conference on Human Language Technology. Plainsborough, New Jersey (1994)

    Google Scholar 

  5. Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: TIGER treebank. In: Proceedings of the First Workshop on Treebanks and Linguistic Theories (TLT 2002), Sozopol, Bulgaria, pp. 24–42 (2002)

    Google Scholar 

  6. Brill, E.: Paucity Shmaucity–What Can We Do With A Trillion Words? In: Invited talk at EMNLP-NAACL 2001 Conference. Pittsburgh, PA, USA (2001)

    Google Scholar 

  7. Brown, P.F., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A Statistical Approach to Machine Translation. Computational Linguistics 16(2), 79–85 (1990)

    Google Scholar 

  8. Charniak, E.: Statistical Language Learning. The MIT Press, Cambridge, MA (1996)

    Google Scholar 

  9. Chomsky, N.: Syntactic Structures. Mouton, The Hague (1957)

    Google Scholar 

  10. Church, K.W.: A Stochastic PARTS Program and Noun Phrase Parser for Unrestricted Text. In: Proceedings of the Second Conference on Applied Natural Language Processing. 26th Annual Meeting of the ACL. Austin, Texas, pp. 136–143 (1988)

    Google Scholar 

  11. Church, K.W.: Speech and Language Processing: Where Have We Been and Where Are We Going? In: Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH/INTERSPEECH-2003). Geneva, Switzerland (2003)

    Google Scholar 

  12. Czech National Corpus, http://ucnk.ff.cuni.cz

  13. Fillmore, C.J.: The case for case. In: Bach, E., Harms, R. (eds.) Universals in Linguistic Theory. New York, pp. 1–90 (1968)

    Google Scholar 

  14. Francis, N. F.: Standard Corpus of Edited Present-day American English. College English 26, 267-273. Reprinted in Geoffrey Sampson and Diana McCarthy (eds.) Corpus Linguistics: Readings in a Widening Discipline. Continuum 2004, London/New York, pp. 27–34 (1965)

    Google Scholar 

  15. Hajič, J.: Building a syntactically annotated corpus: The Prague Dependency Treebank. In: Issues of Valency and Meaning. Studies in Honour of Jarmila Panevová, Karolinum, pp. 106–132. Charles University Press, Prague, Czech Republic (1998)

    Google Scholar 

  16. Hajič, J., et al.: Prague Dependency Treebank 2.0. CDROM. Cat. No. LDC2006T01. Linguistic Data Consortium, Philadelphia, PA (2006), http://ufal.mff.cuni.cz/pdt2.0 ISBN: 1-58563-370-4

  17. Hajič, J., Panevová, J., Urešová, Z., Bémová, A., Kolářová, V., Pajas, P.: PDT-VALLEX: Creating a Large-Coverage Valency Lexicon for Treebank Annotation. In: Proceedings of the 2nd Treebanks and Linguistic Theories Workshop. Växjö, Sweden, November 14-15, pp. 57–68 (2003)

    Google Scholar 

  18. Hajičová, E.: Old linguists never die, they only get obligatorily deleted. Computational Linguistics 32(4), 457–469 (2006)

    Article  Google Scholar 

  19. Jelinek, F.: Continuous Speech Recognition by Statistical Methods. Proceedings of the IEEE 64(4), 532–536 (1976)

    Article  Google Scholar 

  20. Jelinek, F.: Statistical Methods For Speech Recognition. The MIT Press, Cambridge, MA (1998)

    Google Scholar 

  21. Jelinek, F.: Some of My Best Friends Are Linguists. In: Invited talk at the occasion of the Antonio Zampolli Award presented to Frederick Jelinek at the LREC 2004 conference, Lisbon, Portugal (2004)

    Google Scholar 

  22. Jelinek, F., Bahl, L.R., Mercer, R.L.: Design of a Linguistic Statistical Decoder for the Recognition of Continuous Speech Recognition by Statistical Methods. IEEE Transactions on IT 21(3), 250–256 (1975)

    Article  MATH  Google Scholar 

  23. Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice-Hall, Englewood Cliffs (2000)

    Google Scholar 

  24. Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, MA (2000)

    Google Scholar 

  25. Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Lingusitics 19(2), 313–330 (1993)

    Google Scholar 

  26. Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., Grishman, R.: The NomBank Project: An Interim Report. In: HLT-NAACL Workshop: Frontiers in Corpus Annotation. Boston, Massachusetts, USA, pp. 24–31 (2004)

    Google Scholar 

  27. http://www.coli.uni-saarland.de/projects/sfb378/NEGRA-en.html

  28. Och, F.J.: Large-scale Machine Translation: Challenges and Opportunities. In: Invited talk at NAACL/HLT 2007, Rochester, NY, USA (April 22-27, 2007)

    Google Scholar 

  29. Palmer, M.S., Gildea, D., Kingsbury, P.: The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Lingusitics 31(1), 71–105 (2005)

    Article  Google Scholar 

  30. Ribarov, K., Bémová, A., Vidová Hladká, B.: When a statistically oriented parser was more efficient than a linguist: a case of treebank conversion. The Prague Bulletin of Mathematical Linguistics 86, 21–38 (2006)

    Google Scholar 

  31. Robinson, J.J.: Case, category and configuration. Journal of Linguistics 6, 57–80 (1969)

    Article  Google Scholar 

  32. Robinson, J.J.: Depenency structures and transformational rules. Language 46, 259–285 (1970)

    Article  Google Scholar 

  33. Sgall, P.: Zur Frage der Ebenen im Sprachsystem. Travaux linguistiques de Prague 1, 95–106 (1964)

    Google Scholar 

  34. Sgall, P.: Generative Bschreibung und die Ebenen des Sprachsystems. In: presented at the Second International Symposium in Magdeburg, Germany. Zeichen und System der Sprache III 1966, Berlin, pp. 225–239 (1964)

    Google Scholar 

  35. Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence in its Semantic and Pragmatic Aspects. Reidel - Academia, Dordrecht - Prague (1986)

    Google Scholar 

  36. http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus

  37. Vidová Hladká, B.: The Czech Academic Corpus version 1.0 has been released. The Prague Bulletin of Mathematical Lingustics 86, 57–58 (2006)

    Google Scholar 

  38. Weaver, W.: Translation. Memorandum. Reprinted. In: Locke, W.N., Booth, A.D. (eds.) Machine Translation of Languages: Fourteen Essays, pp. 15–23. MIT Press, Cambridge (1949)

    Google Scholar 

  39. Žabokrtský, Z., Lopatková, M.: Valency Frames of Czech Verbs in VALLEX 1.0. In: HLT-NAACL Workshop: Frontiers in Corpus Annotation. Boston, Massachusetts, USA, pp. 70–77 (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Václav Matoušek Pavel Mautner

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Hajič, J., Hajičová, E. (2007). Some of Our Best Friends Are Statisticians. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-74628-7_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-74627-0

  • Online ISBN: 978-3-540-74628-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics