Abstract
In this chapter, we report our experiences from attempting to measure the effectiveness of large electronic discovery (e-Discovery) result sets in the Text Retrieval Conference (TREC) Legal Track campaigns of 2006–2011. For effectiveness measures, we have focused on recall, precision and F 1. We state the estimators that we have used for these measures, and we outline both the rank-based and set-based approaches to sampling that we have taken. We share our experiences with the sampling error in the resulting estimates for the absolute effectiveness on individual topics, relative effectiveness on individual topics, mean effectiveness across topics and relative effectiveness across topics. Finally, we discuss our experiences with assessor error, which we have found has often had a larger impact than sampling error.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Allan J, Carterette B, Dachev B et al (2008) Million query track 2007 overview. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/1MQ.OVERVIEW16.pdf
Baron JR (ed) (2007) The Sedona conference®; best practices commentary on the use of search and information retrieval methods in E-discovery. In: The Sedona conference journal, vol VIII, pp 189–223
Baron JR, Lewis DD, Oard DW (2007) TREC-2006 legal track overview. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/LEGAL06.OVERVIEW.pdf
Buckley C, Voorhees EM (2005) Retrieval system evaluation. In: TREC: experiment and evaluation in information retrieval, pp 53–75
Buckley C, Dimmick D, Soboroff I, Voorhees E (2006) Bias and the limits of pooling. In: Proceedings of SIGIR 2006, pp 619–620
Büttcher S, Clarke CLA, Soboroff I (2007) The TREC 2006 terabyte track. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/TERA06.OVERVIEW.pdf
Carterette B, Soboroff I (2010) The effect of assessor errors on IR system evaluation. In: Proceedings of SIGIR 2010, pp 539–546
Cormack GV, Grossman MR, Hedin B, Oard DW (2011) Overview of the TREC 2010 legal track. In: Proceedings of TREC 2010. http://trec.nist.gov/pubs/trec19/papers/LEGAL10.OVERVIEW.pdf
Devore J, Farnum N (2005) Applied statistics for engineers and scientists, 2nd edn. Thomson Brooks/Cole, Belmont, CA
Grossman MR, Cormack GV (2012) Inconsistent responsiveness determination in document review: difference of opinion or human error? Pace Law Review 32(2, Spring):267–288
Grossman MR, Cormack GV, Hedin B, Oard DW (2012) Overview of the TREC 2011 legal track. In: Proceedings of TREC 2011. http://trec.nist.gov/pubs/trec20/papers/LEGAL.OVERVIEW.2011.pdf
Harman DK (2005) The TREC test collections. In: TREC: experiment and evaluation in information retrieval, pp 21–52
Hedin B, Tomlinson S, Baron JR, Oard DW (2010) Overview of the TREC 2009 legal track. In: Proceedings of TREC 2009. http://trec.nist.gov/pubs/trec18/papers/LEGAL09.OVERVIEW.pdf
Lewis D, Agam G, Argamon S et al (2006) Building a test collection for complex document information processing. In: Proceedings of SIGIR 2006, pp 665–666
Oard DW, Webber W (2013) Information retrieval for E-discovery. Found Trends Inf Retr 7(2–3):99–237
Oard DW, Hedin B, Tomlinson S, Baron JR (2009) Overview of the TREC 2008 legal track. In: Proceedings of TREC 2008. http://trec.nist.gov/pubs/trec17/papers/LEGAL.OVERVIEW08.pdf
Oard DW, Baron JR, Hedin B et al (2010) Evaluation of information retrieval for E-discovery. Artif Intell Law 18(4):347–386
Sanderson M, Zobel J (2005) Information retrieval system evaluation: effort, sensitivity, and reliability. In: Proceedings of SIGIR 2005, pp 162–169
Taylor JR (1997) Error analysis: the study of uncertainties in physical measurements. University Science Book, Sausalito, CA
Thompson SK (2002) Sampling, 2nd edn. Wiley, New York
Tomlinson S (2007) Experiments with the negotiated Boolean queries of the TREC 2006 legal discovery track. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/opentext.legal.final.pdf
Tomlinson S (2008) Experiments with the negotiated Boolean queries of the TREC 2007 legal discovery track. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/open-text.legal.final.pdf
Tomlinson S (2009) Experiments with the negotiated Boolean queries of the TREC 2008 legal track. In: Proceedings of TREC 2008. http://trec.nist.gov/pubs/trec17/papers/open-text.legal.rev.pdf
Tomlinson S, Oard DW, Baron JR, Thompson P (2008) Overview of the TREC 2007 legal track. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/LEGAL.OVERVIEW16.pdf
TREC Legal Track (web site). Last visited January 2017. http://trec-legal.umiacs.umd.edu/
van Rijsbergen, CJ (1979) Information retrieval, 2nd ed. Butterworths, London. http://www.dcs.gla.ac.uk/Keith/Preface.html
Vinjumur JK, Oard DW, Paik JH (2014) Assessing the reliability and reusability of an E-discovery privilege test collection. In: Proceedings of SIGIR 2014, pp 1047–1050
Voorhees EM (2000) Variations in relevance judgments and the measurement of retrieval effectiveness. Inf Process Manage 36(5):697–716
Voorhees EM, Harman D (1997) Overview of the fifth Text REtrieval Conference (TREC-5). In: Proceedings of TREC-5. http://trec.nist.gov/pubs/trec5/papers/overview.ps.gz
Webber W (2013) Approximate recall confidence intervals. ACM Trans Inf Syst 31(1):1–33, article no 2
Webber W, Oard DW, Scholer F, Hedin B (2010) Assessor error in stratified evaluation. In: Proceedings of CIKM 2010, pp 539–548
Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proceedings of CIKM 2006, pp 102–111
Zobel J (1998) How reliable are the results of large-scale information retrieval experiments? In: Proceedings of SIGIR 1998, pp 307–314
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer-Verlag GmbH Germany
About this chapter
Cite this chapter
Tomlinson, S., Hedin, B. (2017). Measuring Effectiveness in the TREC Legal Track. In: Lupu, M., Mayer, K., Kando, N., Trippe, A. (eds) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol 37. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53817-3_6
Download citation
DOI: https://doi.org/10.1007/978-3-662-53817-3_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-53816-6
Online ISBN: 978-3-662-53817-3
eBook Packages: Computer ScienceComputer Science (R0)