Measuring Effectiveness in the TREC Legal Track

Tomlinson, Stephen; Hedin, Bruce

doi:10.1007/978-3-662-53817-3_6

Stephen Tomlinson⁷ &
Bruce Hedin⁸

Part of the book series: The Information Retrieval Series ((INRE,volume 37))

1561 Accesses

Abstract

In this chapter, we report our experiences from attempting to measure the effectiveness of large electronic discovery (e-Discovery) result sets in the Text Retrieval Conference (TREC) Legal Track campaigns of 2006–2011. For effectiveness measures, we have focused on recall, precision and F ₁. We state the estimators that we have used for these measures, and we outline both the rank-based and set-based approaches to sampling that we have taken. We share our experiences with the sampling error in the resulting estimates for the absolute effectiveness on individual topics, relative effectiveness on individual topics, mean effectiveness across topics and relative effectiveness across topics. Finally, we discuss our experiences with assessor error, which we have found has often had a larger impact than sampling error.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 159.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Allan J, Carterette B, Dachev B et al (2008) Million query track 2007 overview. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/1MQ.OVERVIEW16.pdf
Google Scholar
Baron JR (ed) (2007) The Sedona conference®; best practices commentary on the use of search and information retrieval methods in E-discovery. In: The Sedona conference journal, vol VIII, pp 189–223
Google Scholar
Baron JR, Lewis DD, Oard DW (2007) TREC-2006 legal track overview. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/LEGAL06.OVERVIEW.pdf
Google Scholar
Buckley C, Voorhees EM (2005) Retrieval system evaluation. In: TREC: experiment and evaluation in information retrieval, pp 53–75
Google Scholar
Buckley C, Dimmick D, Soboroff I, Voorhees E (2006) Bias and the limits of pooling. In: Proceedings of SIGIR 2006, pp 619–620
Google Scholar
Büttcher S, Clarke CLA, Soboroff I (2007) The TREC 2006 terabyte track. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/TERA06.OVERVIEW.pdf
Google Scholar
Carterette B, Soboroff I (2010) The effect of assessor errors on IR system evaluation. In: Proceedings of SIGIR 2010, pp 539–546
Article Google Scholar
Cormack GV, Grossman MR, Hedin B, Oard DW (2011) Overview of the TREC 2010 legal track. In: Proceedings of TREC 2010. http://trec.nist.gov/pubs/trec19/papers/LEGAL10.OVERVIEW.pdf
Google Scholar
Devore J, Farnum N (2005) Applied statistics for engineers and scientists, 2nd edn. Thomson Brooks/Cole, Belmont, CA
Google Scholar
Grossman MR, Cormack GV (2012) Inconsistent responsiveness determination in document review: difference of opinion or human error? Pace Law Review 32(2, Spring):267–288
Google Scholar
Grossman MR, Cormack GV, Hedin B, Oard DW (2012) Overview of the TREC 2011 legal track. In: Proceedings of TREC 2011. http://trec.nist.gov/pubs/trec20/papers/LEGAL.OVERVIEW.2011.pdf
Google Scholar
Harman DK (2005) The TREC test collections. In: TREC: experiment and evaluation in information retrieval, pp 21–52
Google Scholar
Hedin B, Tomlinson S, Baron JR, Oard DW (2010) Overview of the TREC 2009 legal track. In: Proceedings of TREC 2009. http://trec.nist.gov/pubs/trec18/papers/LEGAL09.OVERVIEW.pdf
Google Scholar
Lewis D, Agam G, Argamon S et al (2006) Building a test collection for complex document information processing. In: Proceedings of SIGIR 2006, pp 665–666
Google Scholar
Oard DW, Webber W (2013) Information retrieval for E-discovery. Found Trends Inf Retr 7(2–3):99–237
Article Google Scholar
Oard DW, Hedin B, Tomlinson S, Baron JR (2009) Overview of the TREC 2008 legal track. In: Proceedings of TREC 2008. http://trec.nist.gov/pubs/trec17/papers/LEGAL.OVERVIEW08.pdf
Google Scholar
Oard DW, Baron JR, Hedin B et al (2010) Evaluation of information retrieval for E-discovery. Artif Intell Law 18(4):347–386
Article Google Scholar
Sanderson M, Zobel J (2005) Information retrieval system evaluation: effort, sensitivity, and reliability. In: Proceedings of SIGIR 2005, pp 162–169
Google Scholar
Taylor JR (1997) Error analysis: the study of uncertainties in physical measurements. University Science Book, Sausalito, CA
Google Scholar
Thompson SK (2002) Sampling, 2nd edn. Wiley, New York
MATH Google Scholar
Tomlinson S (2007) Experiments with the negotiated Boolean queries of the TREC 2006 legal discovery track. In: Proceedings of TREC 2006. http://trec.nist.gov/pubs/trec15/papers/opentext.legal.final.pdf
Google Scholar
Tomlinson S (2008) Experiments with the negotiated Boolean queries of the TREC 2007 legal discovery track. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/open-text.legal.final.pdf
Google Scholar
Tomlinson S (2009) Experiments with the negotiated Boolean queries of the TREC 2008 legal track. In: Proceedings of TREC 2008. http://trec.nist.gov/pubs/trec17/papers/open-text.legal.rev.pdf
Google Scholar
Tomlinson S, Oard DW, Baron JR, Thompson P (2008) Overview of the TREC 2007 legal track. In: Proceedings of TREC 2007. http://trec.nist.gov/pubs/trec16/papers/LEGAL.OVERVIEW16.pdf
Google Scholar
TREC Legal Track (web site). Last visited January 2017. http://trec-legal.umiacs.umd.edu/
van Rijsbergen, CJ (1979) Information retrieval, 2nd ed. Butterworths, London. http://www.dcs.gla.ac.uk/Keith/Preface.html
MATH Google Scholar
Vinjumur JK, Oard DW, Paik JH (2014) Assessing the reliability and reusability of an E-discovery privilege test collection. In: Proceedings of SIGIR 2014, pp 1047–1050
Google Scholar
Voorhees EM (2000) Variations in relevance judgments and the measurement of retrieval effectiveness. Inf Process Manage 36(5):697–716
Article Google Scholar
Voorhees EM, Harman D (1997) Overview of the fifth Text REtrieval Conference (TREC-5). In: Proceedings of TREC-5. http://trec.nist.gov/pubs/trec5/papers/overview.ps.gz
Webber W (2013) Approximate recall confidence intervals. ACM Trans Inf Syst 31(1):1–33, article no 2
Google Scholar
Webber W, Oard DW, Scholer F, Hedin B (2010) Assessor error in stratified evaluation. In: Proceedings of CIKM 2010, pp 539–548
Google Scholar
Yilmaz E, Aslam JA (2006) Estimating average precision with incomplete and imperfect judgments. In: Proceedings of CIKM 2006, pp 102–111
Google Scholar
Zobel J (1998) How reliable are the results of large-scale information retrieval experiments? In: Proceedings of SIGIR 1998, pp 307–314
Google Scholar

Download references

Author information

Authors and Affiliations

Open Text Corporation, 10 Rideau St, 6th Floor, Ottawa, ON, Canada, K1N 9JI
Stephen Tomlinson
H5, 595 Market St., San Francisco, CA, 94105, USA
Bruce Hedin

Authors

Stephen Tomlinson
View author publications
You can also search for this author in PubMed Google Scholar
Bruce Hedin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stephen Tomlinson .

Editor information

Editors and Affiliations

Institute for Software Engineering & Interactive Systems, Vienna University of Technology, Vienna, Austria
Mihai Lupu
Research Platform Responsible Research and Innovation in Academic Practice, University of Vienna, Vienna, Austria
Katja Mayer
Information & Society Research Division, National Institute of Informatics, Tokyo, Japan
Noriko Kando
Patinformatics, LLC , Dublin, Ohio, USA
Anthony J. Trippe

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Tomlinson, S., Hedin, B. (2017). Measuring Effectiveness in the TREC Legal Track. In: Lupu, M., Mayer, K., Kando, N., Trippe, A. (eds) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol 37. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-53817-3_6

Download citation

DOI: https://doi.org/10.1007/978-3-662-53817-3_6
Published: 26 March 2017
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-53816-6
Online ISBN: 978-3-662-53817-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics