Some of Our Best Friends Are Statisticians

Hajič, Jan; Hajičová, Eva

doi:10.1007/978-3-540-74628-7_2

Jan Hajič¹ &
Eva Hajičová¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4629))

Included in the following conference series:

International Conference on Text, Speech and Dialogue

1763 Accesses
1 Citations
1 Altmetric

Abstract

In his LREC 2004 invited talk when awarded by the first ever Antonio Zampolli prize for his essential contributions to the use of spoken and written language resources, Frederick Jelinek has used the title “Some of My Best Friends Are Linguists”. He did so for many reasons, one of them being that he wanted to remove the perception that he dislikes linguists and linguistics after so many people used to cite his famous line from an old presentation at a Natural Language Processing Evaluation workshop in 1988, in which he said “Whenever I fire a linguist our system performance improves.”

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bahl, L.R., Mercer, R.L.: Part-of-speech assignment by a statistical decision algorithm. In: Proceedings of the IEEE International Symposium on Information Theory, pp. 88–89. IEEE Computer Society Press, Los Alamitos (1976)
Google Scholar
Banko, M., Brill, E.: Scaling to Very Very Large Corpora for Natural Language Disambiguation. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics. Toulouse, France (2001)
Google Scholar
Banko, M., Brill, E.: Mitigating the Paucity-of-Data Problem: Exploring the Effect of Training Corpus Size on Classifier Performance for Natural Language Processing. In: Proceedings of the First International Conference on Human Language Technology. San Diego, California, pp. 1–5 (2001)
Google Scholar
Berger, A.L., Brown, P.F., Della Pietra, S.A., Della Pietra, V.J., Gillett, J.R., Lafferty, J.D., Mercer, R.L., Printz, H., Ureš, L.: The Candide System for Machine Translation. In: Proceedings of the ARPA Conference on Human Language Technology. Plainsborough, New Jersey (1994)
Google Scholar
Brants, S., Dipper, S., Hansen, S., Lezius, W., Smith, G.: TIGER treebank. In: Proceedings of the First Workshop on Treebanks and Linguistic Theories (TLT 2002), Sozopol, Bulgaria, pp. 24–42 (2002)
Google Scholar
Brill, E.: Paucity Shmaucity–What Can We Do With A Trillion Words? In: Invited talk at EMNLP-NAACL 2001 Conference. Pittsburgh, PA, USA (2001)
Google Scholar
Brown, P.F., Cocke, J., Della Pietra, S.A., Della Pietra, V.J., Jelinek, F., Lafferty, J.D., Mercer, R.L., Roossin, P.S.: A Statistical Approach to Machine Translation. Computational Linguistics 16(2), 79–85 (1990)
Google Scholar
Charniak, E.: Statistical Language Learning. The MIT Press, Cambridge, MA (1996)
Google Scholar
Chomsky, N.: Syntactic Structures. Mouton, The Hague (1957)
Google Scholar
Church, K.W.: A Stochastic PARTS Program and Noun Phrase Parser for Unrestricted Text. In: Proceedings of the Second Conference on Applied Natural Language Processing. 26th Annual Meeting of the ACL. Austin, Texas, pp. 136–143 (1988)
Google Scholar
Church, K.W.: Speech and Language Processing: Where Have We Been and Where Are We Going? In: Proceedings of the 8th European Conference on Speech Communication and Technology (EUROSPEECH/INTERSPEECH-2003). Geneva, Switzerland (2003)
Google Scholar
Czech National Corpus, http://ucnk.ff.cuni.cz
Fillmore, C.J.: The case for case. In: Bach, E., Harms, R. (eds.) Universals in Linguistic Theory. New York, pp. 1–90 (1968)
Google Scholar
Francis, N. F.: Standard Corpus of Edited Present-day American English. College English 26, 267-273. Reprinted in Geoffrey Sampson and Diana McCarthy (eds.) Corpus Linguistics: Readings in a Widening Discipline. Continuum 2004, London/New York, pp. 27–34 (1965)
Google Scholar
Hajič, J.: Building a syntactically annotated corpus: The Prague Dependency Treebank. In: Issues of Valency and Meaning. Studies in Honour of Jarmila Panevová, Karolinum, pp. 106–132. Charles University Press, Prague, Czech Republic (1998)
Google Scholar
Hajič, J., et al.: Prague Dependency Treebank 2.0. CDROM. Cat. No. LDC2006T01. Linguistic Data Consortium, Philadelphia, PA (2006), http://ufal.mff.cuni.cz/pdt2.0 ISBN: 1-58563-370-4
Hajič, J., Panevová, J., Urešová, Z., Bémová, A., Kolářová, V., Pajas, P.: PDT-VALLEX: Creating a Large-Coverage Valency Lexicon for Treebank Annotation. In: Proceedings of the 2nd Treebanks and Linguistic Theories Workshop. Växjö, Sweden, November 14-15, pp. 57–68 (2003)
Google Scholar
Hajičová, E.: Old linguists never die, they only get obligatorily deleted. Computational Linguistics 32(4), 457–469 (2006)
Article Google Scholar
Jelinek, F.: Continuous Speech Recognition by Statistical Methods. Proceedings of the IEEE 64(4), 532–536 (1976)
Article Google Scholar
Jelinek, F.: Statistical Methods For Speech Recognition. The MIT Press, Cambridge, MA (1998)
Google Scholar
Jelinek, F.: Some of My Best Friends Are Linguists. In: Invited talk at the occasion of the Antonio Zampolli Award presented to Frederick Jelinek at the LREC 2004 conference, Lisbon, Portugal (2004)
Google Scholar
Jelinek, F., Bahl, L.R., Mercer, R.L.: Design of a Linguistic Statistical Decoder for the Recognition of Continuous Speech Recognition by Statistical Methods. IEEE Transactions on IT 21(3), 250–256 (1975)
Article MATH Google Scholar
Jurafsky, D., Martin, J.H.: Speech and Language Processing. Prentice-Hall, Englewood Cliffs (2000)
Google Scholar
Manning, C.D., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press, Cambridge, MA (2000)
Google Scholar
Marcus, M.P., Santorini, B., Marcinkiewicz, M.A.: Building a Large Annotated Corpus of English: The Penn Treebank. Computational Lingusitics 19(2), 313–330 (1993)
Google Scholar
Meyers, A., Reeves, R., Macleod, C., Szekely, R., Zielinska, V., Young, B., Grishman, R.: The NomBank Project: An Interim Report. In: HLT-NAACL Workshop: Frontiers in Corpus Annotation. Boston, Massachusetts, USA, pp. 24–31 (2004)
Google Scholar
http://www.coli.uni-saarland.de/projects/sfb378/NEGRA-en.html
Och, F.J.: Large-scale Machine Translation: Challenges and Opportunities. In: Invited talk at NAACL/HLT 2007, Rochester, NY, USA (April 22-27, 2007)
Google Scholar
Palmer, M.S., Gildea, D., Kingsbury, P.: The Proposition Bank: An Annotated Corpus of Semantic Roles. Computational Lingusitics 31(1), 71–105 (2005)
Article Google Scholar
Ribarov, K., Bémová, A., Vidová Hladká, B.: When a statistically oriented parser was more efficient than a linguist: a case of treebank conversion. The Prague Bulletin of Mathematical Linguistics 86, 21–38 (2006)
Google Scholar
Robinson, J.J.: Case, category and configuration. Journal of Linguistics 6, 57–80 (1969)
Article Google Scholar
Robinson, J.J.: Depenency structures and transformational rules. Language 46, 259–285 (1970)
Article Google Scholar
Sgall, P.: Zur Frage der Ebenen im Sprachsystem. Travaux linguistiques de Prague 1, 95–106 (1964)
Google Scholar
Sgall, P.: Generative Bschreibung und die Ebenen des Sprachsystems. In: presented at the Second International Symposium in Magdeburg, Germany. Zeichen und System der Sprache III 1966, Berlin, pp. 225–239 (1964)
Google Scholar
Sgall, P., Hajičová, E., Panevová, J.: The Meaning of the Sentence in its Semantic and Pragmatic Aspects. Reidel - Academia, Dordrecht - Prague (1986)
Google Scholar
http://www.ims.uni-stuttgart.de/projekte/TIGER/TIGERCorpus
Vidová Hladká, B.: The Czech Academic Corpus version 1.0 has been released. The Prague Bulletin of Mathematical Lingustics 86, 57–58 (2006)
Google Scholar
Weaver, W.: Translation. Memorandum. Reprinted. In: Locke, W.N., Booth, A.D. (eds.) Machine Translation of Languages: Fourteen Essays, pp. 15–23. MIT Press, Cambridge (1949)
Google Scholar
Žabokrtský, Z., Lopatková, M.: Valency Frames of Czech Verbs in VALLEX 1.0. In: HLT-NAACL Workshop: Frontiers in Corpus Annotation. Boston, Massachusetts, USA, pp. 70–77 (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Formal and Applied Linguistics, Charles University in Prague, Malostranské nám. 25, CZ-11800 Prague, Czech Republic
Jan Hajič & Eva Hajičová

Authors

Jan Hajič
View author publications
You can also search for this author in PubMed Google Scholar
Eva Hajičová
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Václav Matoušek Pavel Mautner

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hajič, J., Hajičová, E. (2007). Some of Our Best Friends Are Statisticians. In: Matoušek, V., Mautner, P. (eds) Text, Speech and Dialogue. TSD 2007. Lecture Notes in Computer Science(), vol 4629. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74628-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-74628-7_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-74627-0
Online ISBN: 978-3-540-74628-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics