Skip to main content

Mice from a Mountain: Reflections on Current Issues in Evaluation of Written Language Technology

  • Chapter
Charting a New Course: Natural Language Processing and Information Retrieval

Part of the book series: The Kluwer International Series on Information Retrieval ((INRE,volume 16))

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • S. Attfield, A. Blandford and J. Dowell. Information seeking in the context of writing: a design psychology interpretation of the ‘problematic situation’. Journal of Documentation, 59(4):430–453, 2003.

    Article  Google Scholar 

  • S. Attfield and J. Dowell. Information seeking and use by newspaper journalists. Journal of Documentation, 59(2):187–204, 2003.

    Article  Google Scholar 

  • E. J. Barker. Investigating the task of researching and writing background news. Technical Report CS-04-15, Department of Computer Science, University of Sheffield, 2004.

    Google Scholar 

  • E. J. Barker and J. Foster. Current practice in gathering information and writing news at the Press Association. Technical Report CS-04-14, Department of Computer Science, University of Sheffield, 2004.

    Google Scholar 

  • Boehm, B. Verifying and validating software requirements and design specifications. IEEE Software, Vol. 1(1), 1984.

    Google Scholar 

  • J. Burger, C. Cardie, V. Chaudhri, R. Gaizauskas, S. Harabagiu, D. Israel, C. Jacquemin, C-Y. Lin, S. Maiorano, G. Miller, D. Moldovan, B. Ogden, J. Prager, E. Riloff, A. Singhal, R. Shrihari, T. Strzalkowski, E. Voorhees and R. Weischedel. Issues, tasks and program structures to roadmap research in question & answering (q&a). Technical report, 2002. URL www-nlpir.nist.gov/projects/duc/papers/qa.Roadmap-paper_v2.doc.

    Google Scholar 

  • J. Carbonell, D. Harman, E. Hovy, S. Maiorano, J. Prange and K. Sparck Jones. Vision statement to guide research in question & answering (q&a) and text summarization. Technical Report Final version 1, 2000. URL http://www-nlpri.nist.gov/proiects/duc/papers/Final-Vision-Paper-v a.pdf

    Google Scholar 

  • N. Craswell, D. Hawking, R. Wilkinson and M. Wu. Overview of the TREC 2003 web track. In Proceedings of the twelfth Text Retrieval Conference (TREC-2003), 2003. URL http://trec.nist.gov/pubs/trecl2/papers/WEB.OVERVIEW.pdf. NIST Special Publication 500-255.

    Google Scholar 

  • R. Crouch, R.J. Gaizauskas and K. Netter. Interim report of the study group on assessment and evaluation. Technical report, EAGLES project, Language Engineering Programme, European Commission, 1995. URL http://lanl.arxiv.org/abs/cmp-1l/9601003.

    Google Scholar 

  • DUCO 1. DUC 2001 guidelines, 2001. URL http://www-nlpir.nist.gov/projects/duc/guidelines/2001.html.

    Google Scholar 

  • DUC02. DUC 2002 guidelines, 2002. URL http://www-nlpir.nist.gov/projects/duc/guidelines/2002.html.

    Google Scholar 

  • DUC03. DUC 2003 guidelines, 2003. URL http://www-nlpir.nist.ov/proiects/duc/guidelines/2003.html.

    Google Scholar 

  • DUC04. DUC 2004 guidelines, 2004. URL http://www-nlpir.nist.gov/projects/duc/guidelines/2004.html.

    Google Scholar 

  • EAGLES Evaluation Working Group. Evaluation of natural language processing systems. Technical Report EAG-EWG-PR.2, EAGLES: Expert Advisory Group on Language Engineering Standards, 1995. URL http//:issco-www.unige.ch/ewg95.

    Google Scholar 

  • R. Gaizauskas. A Review of Evaluating Natural Language Processing Systems: An Analysis and Review by Karen Sparck Jones and Julia Galliers. Journal of Natural Language Engineering, 4(2), 1998a.

    Google Scholar 

  • R. Gaizauskas, editor. Journal of Computer Speech and Language, Special Issue on Evaluation, volume 12(3). 1998b.

    Google Scholar 

    Google Scholar 

  • D. Harman. Overview of the TREC 2002 novelty track. In Proceedings of the Eleventh Text Retrieval Conference (TREC 2002), 2003. URL http://trec.nist.gov/pubs/trec 11/papers/NOVELTY.OVER.pdf. NIST Special Publication 500-251.

    Google Scholar 

  • D. Harman. Overview of the fourth text retrieval conference (TREC-4). In Proceedings of the Fourth Text Retrieval Conference (TREC-4), 1995. URL http://trec.nist.gov/pubs/trec4/overview.ps.gz. NIST Special Publication 500-236.

    Google Scholar 

  • Donna Harman and Paul Over. The effects of human variation in DUC summarization evaluation. In Stan Szpakowicz Marie-Francine Moens, editor, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 10–17, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL http://acl.ldc.upenn.edu/acl2004/textsummarization/pdf/Harman.pdf.

    Google Scholar 

  • W. Hersh and P. Over. SIGIR workshop on interactive retrieval at TREC and beyond. SIGIR Forum, 34(1), 2000a. URL http://www.sigir.org/forum/S2000/Interactive report.pdf.

    Google Scholar 

  • W. Hersh and P. Over. TREC-9 interactive track report. In Proceedings of the Ninth Text Retrieval Conference (TREC-9), 2000b. URL http://trec.nist.gov/pubs/trec9/papers/t9irep.pdf. NIST Special Publication 500-249.

    Google Scholar 

  • E. Lagergren and P. Over. Comparing interactive information retrieval systems across sites: The TREC-6 interactive track matrix experiment. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ‘98), pages 164–172, Melbourne, 1998.

    Google Scholar 

  • C-Y. Lin. SEE — summary evaluation environment: Users guide, version 1.0. Technical report, Information Sciences Institute, University of Southern California, 2001. URL http://www.isi.edu/cvl/SEE/SEEManual.pdf.

    Google Scholar 

  • C-Y. Lin. ROUGE: A package for automatic evaluation of summaries. In Stan Szpakowicz Marie-Francine Moens, editor, Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pages 74–81, Barcelona, Spain, July 2004. Association for Computational Linguistics. URL http://acl.ldc.upenn.edu/ac2004/textsummarization/pdf/lin.pdf.

    Google Scholar 

  • P. Over. TREC-5 interactive track report. In Proceedings of the Fifth Text Retrieval Conference (TREC-5), 1996. URL http://trec.nist.gov/pubs/trec5/papers/trackreport.ps.2z. NIST Special Publication 500-238.

    Google Scholar 

  • P. Over. Introduction to DUC-2001: An intrinsic evaluation of generic news text summarization systems. In Proceedings of the of the SIGIR 2001 Workshop on Text Summarization (DUC-2001), 2001a. URL http://www-nlpir.nist.gov/projects/duc/pubs/2001slides/pauls_slides/index.htm.

    Google Scholar 

  • P. Over. The TREC interactive track: an annotated bibliography. Information Processing and Management, 37(3):369–381, 2001b.

    Article  Google Scholar 

  • P. Over and W. Liggett. Introduction to DUC-2002: An intrinsic evaluation of generic news text summarization systems. In Proceedings of the ACL 2002 Workshop on Text Summarization (DUC-2002), 2002. URL http://www-nlpir.nist.gov/projects/duc/pubs/2002slides/overview.02.pdf.

    Google Scholar 

  • P. Over and J. Yen. Introduction to DUC-2003: An intrinsic evaluation of generic news text summarization systems. In Proceedings of the Human Language Technology 2003 Workshop on Text Summarization (DUC-2003), 2003. URL http://www-nlpir.nist. ov/proiects/duc/pubs/2003slides/duc2003intro.pdf.

    Google Scholar 

  • P. Over and J. Yen. Introduction to DUC-2004: An intrinsic evaluation of generic news text summarization systems. In Proceedings of the HLT/NAACL 2004 Document Understanding Workshop (DUC-2004), 2004. URL http://www-nlpir.nist. ov/proiects/duc/pubs/2004slides/duc2004intro.pdf.

    Google Scholar 

  • B. Shneiderman, D. Byrd and W.B. Croft. Sorting out searching: A user-interface framework for text searches. Communications of the ACM, 41(4):95–98, 1998.

    Article  Google Scholar 

  • H. Simon. The Sciences of the Artificial. MIT Press, third edition, 1996.

    Google Scholar 

  • R. Soricut and D. Marcu. Sentence level discourse parsing using syntactic and lexical information. In Proceedings of the Human Language Technology and North American Association for Computational Linguistics Conference (HLT/NAACL), 2003.

    Google Scholar 

  • K. Sparck Jones. The Cranfield tests. In K. Sparck Jones, editor, Information Retrieval Experiment, pages 256–284. Butterworths, London, 1981a. URL http://www.nist.gov/itl/div894/984.02/projects/irlib.

    Google Scholar 

  • K. Sparck Jones. Retrieval system tests: 1958–1978. In K. Sparck Jones, editor, Information Retrieval Experiment, pages 213–255. Butterworths, London, 1981b. URL http://www.nist.gov/itl/div894/984.02/projects/irlib.

    Google Scholar 

  • K. Sparck Jones. Towards better NLP system evaluation. In Proceedings of the Human Language Technology Workshop, pages 102–107. Morgan Kaufmnann, 1994.

    Google Scholar 

  • K. Sparck Jones. Reflections on TREC. Information Management & Processing, 31(3):291–314, 1995.

    Google Scholar 

  • K. Sparck Jones. Further reflections on TREC. Information Management & Processing, 36:37–85, 2000.

    Google Scholar 

  • K. Sparck Jones. Factorial summary evaluation. In Proceedings of the 1st Document Understanding Conference, 2001. URL http://www-nlpir.nist.gov/proj ects/duc/pubs/200 lpapers/cambridge2.pdf.

    Google Scholar 

  • K. Sparck Jones, J.R. Galliers. Evaluating Natural Language Processing Systems. Sprinter, Berlin, 1996.

    Google Scholar 

  • E. Voorhees. Overview of the TREC-9 question answering track. In Proceedings of the Ninth Text Retrieval Conference (TREC-9), 2001. URL http://trec.nist.gov/pubs/trec9/papers/qa overview.pdf. NIST Special Publication 500-249.

    Google Scholar 

  • E. Voorhees. Overview of the TREC 2001 question answering track. In Proceedings of the Tenth Text Retrieval Conference (TREC 2001), 2002. URL http://trec.nist.gov/pubs/trec10/papers/qa10.pdf. NIST Special Publication 500-250.

    Google Scholar 

  • E. Voorhees. Overview of the TREC 2002 question answering track. In Proceedings of the Eleventh Text Retrieval Conference (TREC 2002), 2003. URL http://trec.nist.gov/pubs/trec11/papers/QA11.pdf. NIST Special Publication 500-251.

    Google Scholar 

  • E. Voorhees. Overview of the TREC 2003 question answering track. In Proceedings of the Twelfth Text Retrieval Conference (TREC 2003), 2004. URL http://trec.nist.gov/pubs/trec12/papers/QA.OVERVIEW.pdf. NIST Special Publication 500-255.

    Google Scholar 

  • E. Voorhees and D. Tice. The TREC-8 question answering track evaluation. In Proceedings of the Eighth Text Retrieval Conference (TREC-8), 2000. URL http://trec.nist.gov/pubs/trec8/papers/qa8.pdf. NIST Special Publication 500-246.

    Google Scholar 

  • C. Wayne. Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation. In Proceedings of the Second International Conference and Evaluation (LREC2000), 2000. URL http://www.nist.gov/speech/tests/tdt/research links/Wavne-LREC2000.ps.

    Google Scholar 

  • T.D. Wilson. Human information behaviour. Informing Science, 3(2):49–55, 2000.

    Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer

About this chapter

Cite this chapter

Gaizauskas, R., Barker, E.J. (2005). Mice from a Mountain: Reflections on Current Issues in Evaluation of Written Language Technology. In: Tait, J.I. (eds) Charting a New Course: Natural Language Processing and Information Retrieval. The Kluwer International Series on Information Retrieval, vol 16. Springer, Dordrecht. https://doi.org/10.1007/1-4020-3467-9_12

Download citation

  • DOI: https://doi.org/10.1007/1-4020-3467-9_12

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-3343-8

  • Online ISBN: 978-1-4020-3467-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics