Mice from a Mountain: Reflections on Current Issues in Evaluation of Written Language Technology

S. Attfield, A. Blandford and J. Dowell. Information seeking in the context of writing: a design psychology interpretation of the ‘problematic situation’. Journal of Documentation, 59(4):430–453, 2003.
Article Google Scholar
S. Attfield and J. Dowell. Information seeking and use by newspaper journalists. Journal of Documentation, 59(2):187–204, 2003.
Article Google Scholar
E. J. Barker. Investigating the task of researching and writing background news. Technical Report CS-04-15, Department of Computer Science, University of Sheffield, 2004.
Google Scholar
E. J. Barker and J. Foster. Current practice in gathering information and writing news at the Press Association. Technical Report CS-04-14, Department of Computer Science, University of Sheffield, 2004.
Google Scholar
Boehm, B. Verifying and validating software requirements and design specifications. IEEE Software, Vol. 1(1), 1984.
Google Scholar
J. Burger, C. Cardie, V. Chaudhri, R. Gaizauskas, S. Harabagiu, D. Israel, C. Jacquemin, C-Y. Lin, S. Maiorano, G. Miller, D. Moldovan, B. Ogden, J. Prager, E. Riloff, A. Singhal, R. Shrihari, T. Strzalkowski, E. Voorhees and R. Weischedel. Issues, tasks and program structures to roadmap research in question & answering (q&a). Technical report, 2002. URL www-nlpir.nist.gov/projects/duc/papers/qa.Roadmap-paper_v2.doc.
Google Scholar
J. Carbonell, D. Harman, E. Hovy, S. Maiorano, J. Prange and K. Sparck Jones. Vision statement to guide research in question & answering (q&a) and text summarization. Technical Report Final version 1, 2000. URL http://www-nlpri.nist.gov/proiects/duc/papers/Final-Vision-Paper-v a.pdf
Google Scholar
N. Craswell, D. Hawking, R. Wilkinson and M. Wu. Overview of the TREC 2003 web track. In Proceedings of the twelfth Text Retrieval Conference (TREC-2003), 2003. URL http://trec.nist.gov/pubs/trecl2/papers/WEB.OVERVIEW.pdf. NIST Special Publication 500-255.
Google Scholar
R. Crouch, R.J. Gaizauskas and K. Netter. Interim report of the study group on assessment and evaluation. Technical report, EAGLES project, Language Engineering Programme, European Commission, 1995. URL http://lanl.arxiv.org/abs/cmp-1l/9601003.
Google Scholar
DUCO 1. DUC 2001 guidelines, 2001. URL http://www-nlpir.nist.gov/projects/duc/guidelines/2001.html.
Google Scholar
DUC02. DUC 2002 guidelines, 2002. URL http://www-nlpir.nist.gov/projects/duc/guidelines/2002.html.
Google Scholar
DUC03. DUC 2003 guidelines, 2003. URL http://www-nlpir.nist.ov/proiects/duc/guidelines/2003.html.
Google Scholar
DUC04. DUC 2004 guidelines, 2004. URL http://www-nlpir.nist.gov/projects/duc/guidelines/2004.html.
Google Scholar
EAGLES Evaluation Working Group. Evaluation of natural language processing systems. Technical Report EAG-EWG-PR.2, EAGLES: Expert Advisory Group on Language Engineering Standards, 1995. URL http//:issco-www.unige.ch/ewg95.
Google Scholar
R. Gaizauskas. A Review of Evaluating Natural Language Processing Systems: An Analysis and Review by Karen Sparck Jones and Julia Galliers. Journal of Natural Language Engineering, 4(2), 1998a.
Google Scholar
R. Gaizauskas, editor. Journal of Computer Speech and Language, Special Issue on Evaluation, volume 12(3). 1998b.
Google Scholar
Google Scholar
D. Harman. Overview of the TREC 2002 novelty track. In Proceedings of the Eleventh Text Retrieval Conference (TREC 2002), 2003. URL http://trec.nist.gov/pubs/trec 11/papers/NOVELTY.OVER.pdf. NIST Special Publication 500-251.
Google Scholar
D. Harman. Overview of the fourth text retrieval conference (TREC-4). In Proceedings of the Fourth Text Retrieval Conference (TREC-4), 1995. URL http://trec.nist.gov/pubs/trec4/overview.ps.gz. NIST Special Publication 500-236.
Google Scholar
Donna Harman and Paul Over. The effects of human variation in DUC summarization evaluation. In Stan Szpakowicz Marie-Francine Moens, editor, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 10–17, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL http://acl.ldc.upenn.edu/acl2004/textsummarization/pdf/Harman.pdf.
Google Scholar
W. Hersh and P. Over. SIGIR workshop on interactive retrieval at TREC and beyond. SIGIR Forum, 34(1), 2000a. URL http://www.sigir.org/forum/S2000/Interactive report.pdf.
Google Scholar
W. Hersh and P. Over. TREC-9 interactive track report. In Proceedings of the Ninth Text Retrieval Conference (TREC-9), 2000b. URL http://trec.nist.gov/pubs/trec9/papers/t9irep.pdf. NIST Special Publication 500-249.
Google Scholar
E. Lagergren and P. Over. Comparing interactive information retrieval systems across sites: The TREC-6 interactive track matrix experiment. In Proceedings of the 21^st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘98), pages 164–172, Melbourne, 1998.
Google Scholar
C-Y. Lin. SEE — summary evaluation environment: Users guide, version 1.0. Technical report, Information Sciences Institute, University of Southern California, 2001. URL http://www.isi.edu/cvl/SEE/SEEManual.pdf.
Google Scholar
C-Y. Lin. ROUGE: A package for automatic evaluation of summaries. In Stan Szpakowicz Marie-Francine Moens, editor, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL http://acl.ldc.upenn.edu/ac2004/textsummarization/pdf/lin.pdf.
Google Scholar
P. Over. TREC-5 interactive track report. In Proceedings of the Fifth Text Retrieval Conference (TREC-5), 1996. URL http://trec.nist.gov/pubs/trec5/papers/trackreport.ps.2z. NIST Special Publication 500-238.
Google Scholar
P. Over. Introduction to DUC-2001: An intrinsic evaluation of generic news text summarization systems. In Proceedings of the of the SIGIR 2001 Workshop on Text Summarization (DUC-2001), 2001a. URL http://www-nlpir.nist.gov/projects/duc/pubs/2001slides/pauls_slides/index.htm.
Google Scholar
P. Over. The TREC interactive track: an annotated bibliography. Information Processing and Management, 37(3):369–381, 2001b.
Article Google Scholar
P. Over and W. Liggett. Introduction to DUC-2002: An intrinsic evaluation of generic news text summarization systems. In Proceedings of the ACL 2002 Workshop on Text Summarization (DUC-2002), 2002. URL http://www-nlpir.nist.gov/projects/duc/pubs/2002slides/overview.02.pdf.
Google Scholar
P. Over and J. Yen. Introduction to DUC-2003: An intrinsic evaluation of generic news text summarization systems. In Proceedings of the Human Language Technology 2003 Workshop on Text Summarization (DUC-2003), 2003. URL http://www-nlpir.nist. ov/proiects/duc/pubs/2003slides/duc2003intro.pdf.
Google Scholar
P. Over and J. Yen. Introduction to DUC-2004: An intrinsic evaluation of generic news text summarization systems. In Proceedings of the HLT/NAACL 2004 Document Understanding Workshop (DUC-2004), 2004. URL http://www-nlpir.nist. ov/proiects/duc/pubs/2004slides/duc2004intro.pdf.
Google Scholar
B. Shneiderman, D. Byrd and W.B. Croft. Sorting out searching: A user-interface framework for text searches. Communications of the ACM, 41(4):95–98, 1998.
Article Google Scholar
H. Simon. The Sciences of the Artificial. MIT Press, third edition, 1996.
Google Scholar
R. Soricut and D. Marcu. Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), 2003.
Google Scholar
K. Sparck Jones. The Cranfield tests. In K. Sparck Jones, editor, Information Retrieval Experiment, pages 256–284. Butterworths, London, 1981a. URL http://www.nist.gov/itl/div894/984.02/projects/irlib.
Google Scholar
K. Sparck Jones. Retrieval system tests: 1958–1978. In K. Sparck Jones, editor, Information Retrieval Experiment, pages 213–255. Butterworths, London, 1981b. URL http://www.nist.gov/itl/div894/984.02/projects/irlib.
Google Scholar
K. Sparck Jones. Towards better NLP system evaluation. In Proceedings of the Human Language Technology Workshop, pages 102–107. Morgan Kaufmnann, 1994.
Google Scholar
K. Sparck Jones. Reflections on TREC. Information Management & Processing, 31(3):291–314, 1995.
Google Scholar
K. Sparck Jones. Further reflections on TREC. Information Management & Processing, 36:37–85, 2000.
Google Scholar
K. Sparck Jones. Factorial summary evaluation. In Proceedings of the 1^st Document Understanding Conference, 2001. URL http://www-nlpir.nist.gov/proj ects/duc/pubs/200 lpapers/cambridge2.pdf.
Google Scholar
K. Sparck Jones, J.R. Galliers. Evaluating Natural Language Processing Systems. Sprinter, Berlin, 1996.
Google Scholar
E. Voorhees. Overview of the TREC-9 question answering track. In Proceedings of the Ninth Text Retrieval Conference (TREC-9), 2001. URL http://trec.nist.gov/pubs/trec9/papers/qa overview.pdf. NIST Special Publication 500-249.
Google Scholar
E. Voorhees. Overview of the TREC 2001 question answering track. In Proceedings of the Tenth Text Retrieval Conference (TREC 2001), 2002. URL http://trec.nist.gov/pubs/trec10/papers/qa10.pdf. NIST Special Publication 500-250.
Google Scholar
E. Voorhees. Overview of the TREC 2002 question answering track. In Proceedings of the Eleventh Text Retrieval Conference (TREC 2002), 2003. URL http://trec.nist.gov/pubs/trec11/papers/QA11.pdf. NIST Special Publication 500-251.
Google Scholar
E. Voorhees. Overview of the TREC 2003 question answering track. In Proceedings of the Twelfth Text Retrieval Conference (TREC 2003), 2004. URL http://trec.nist.gov/pubs/trec12/papers/QA.OVERVIEW.pdf. NIST Special Publication 500-255.
Google Scholar
E. Voorhees and D. Tice. The TREC-8 question answering track evaluation. In Proceedings of the Eighth Text Retrieval Conference (TREC-8), 2000. URL http://trec.nist.gov/pubs/trec8/papers/qa8.pdf. NIST Special Publication 500-246.
Google Scholar
C. Wayne. Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation. In Proceedings of the Second International Conference and Evaluation (LREC2000), 2000. URL http://www.nist.gov/speech/tests/tdt/research links/Wavne-LREC2000.ps.
Google Scholar
T.D. Wilson. Human information behaviour. Informing Science, 3(2):49–55, 2000.
Google Scholar

Download references

Authors

Robert Gaizauskas
View author publications
You can also search for this author in PubMed Google Scholar
Emma J. Barker
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Computing and Technology, University of Sunderland, Sunderland, UK
John I. Tait

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gaizauskas, R., Barker, E.J. (2005). Mice from a Mountain: Reflections on Current Issues in Evaluation of Written Language Technology. In: Tait, J.I. (eds) Charting a New Course: Natural Language Processing and Information Retrieval. The Kluwer International Series on Information Retrieval, vol 16. Springer, Dordrecht. https://doi.org/10.1007/1-4020-3467-9_12

Download citation

DOI: https://doi.org/10.1007/1-4020-3467-9_12
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-3343-8
Online ISBN: 978-1-4020-3467-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Mice from a Mountain: Reflections on Current Issues in Evaluation of Written Language Technology

Access this chapter

Preview

References