Scalable and explainable legal prediction


Legal decision-support systems have the potential to improve access to justice, administrative efficiency, and judicial consistency, but broad adoption of such systems is contingent on development of technologies with low knowledge-engineering, validation, and maintenance costs. This paper describes two approaches to an important form of legal decision support—explainable outcome prediction—that obviate both annotation of an entire decision corpus and manual processing of new cases. The first approach, which uses an attention network for prediction and attention weights to highlight salient case text, was shown to be capable of predicting decisions, but attention-weight-based text highlighting did not demonstrably improve human decision speed or accuracy in an evaluation with 61 human subjects. The second approach, termed semi-supervised case annotation for legal explanations, exploits structural and semantic regularities in case corpora to identify textual patterns that have both predictable relationships to case decisions and explanatory value.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4


  1. 1.

    We note that models for legal prediction, as with other inductive models in dynamic domains, can be subject to concept drift (Medvedeva et al. 2020).

  2. 2.

    See The EXplainable AI in Law (XAILA) (2018) for a recent exception to this generalization.

  3. 3.

  4. 4.

    Cases in which decisions consist of numerical awards can be modeled as regression problems. For simplicity, we confine the discussion in this paper to categorical classification.

  5. 5.


  6. 6.


  7. 7.

    At the time of writing, we have not yet completed annotation of each individual issue for ever instance in our data set. The experiments described below therefore involve prediction only of the overall outcome of the case without individual issue decisions.

  8. 8.

    Decision sections were annotated as well. However, since Decision sections consisted only of brief conclusory text, this portion was useful only for obtaining the decision label—transferred or not transferred—but was not useful for explanation purposes and were therefore not used in tag projection.

  9. 9.

    The annotated corpus available to researchers at

  10. 10.

    In tenfold cross validation, we observed a mean f-measure of 0.971 and MCC of 0.815 for transfer prediction using an SVM (Platt 1999) applied to the 1133 highest information n-grams \((n=1{-}5)\) occurring in the the stop-word filtered text of the Findings sections of the full corpus.

  11. 11.

    We used a 300-dimension word embedding based on 55,975,964 words and a skipgram model, which we found outperformed cbow for our task.

  12. 12.

    For this, and each of the tests below we used XGBoost (Chen and Guestrin 2016), an efficient implementation of the gradient boosting algorithm, for prediction. However, we obtained very similar results using the Hall et al. (2009) implementation of Bayesian Network classification (Bouckaert 2005).

  13. 13.

  14. 14.

  15. 15.

    See 15(e) of the Rules for Uniform Domain Name Dispute Resolution Policy for CIBF,


  1. Al-Abdulkarim L, Atkinson K, Bench-Capon TJM, Whittle S, Williams R, Wolfenden C (2017) Noise induced hearing loss: an application of the angelic methodology. In: Legal knowledge and information systems—JURIX 2017: the thirtieth annual conference, Luxembourg, 13–15 December 2017, pp 79–88

  2. Alarie B, Niblett A, Yoon A (2017) Using machine learning to predict outcomes in tax law. Available at SSRN or

  3. Aletras N, Tsarapatsanis D, Preotiuc-Pietro D, Lampos V (2016) Predicting judicial decisions of the European Court of Human Rights: a natural language processing perspective. PeerJ CompSci.

  4. Aleven VAWMM (1997) Teaching case-based argumentation through a model and examples. PhD thesis, University of Pittsburgh, Pittsburgh. AAI9821228

  5. Aleven V, Ashley K (1996) Doing things with factors. In: Proceedings of the 3rd European workshop on case-based reasoning (EWCR-96), Lausanne, pp 76–90

  6. Ashley KD (2017) Artificial intelligence and legal analytics: new tools for law practice in the digital age. Cambridge University Press, Cambridge

    Google Scholar 

  7. Ashley KD, Aleven V (1997) Reasoning symbolically about partially matched cases. In: Proceedings of the 15th international joint conference on artificial intelligence. Morgan Kauffmann, San Francisco, pp 335–341

  8. Ashley KD, Brüninghaus S (2009) Automatically classifying case texts and predicting outcomes. Artif Intell Law 17(2):125–165

    Article  Google Scholar 

  9. Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. Preprint arXiv:1409.0473

  10. Bench-Capon TJM, Dunne PE (2007) Argumentation in artificial intelligence. Artif Intell 171(10–15):619–641

    MathSciNet  MATH  Article  Google Scholar 

  11. Berger AL, Pietra VJD, Pietra SAD (1996) A maximum entropy approach to natural language processing. Comput Linguist 22(1):39–71

    Google Scholar 

  12. Bojanowski P, Grave E, Joulin A, Mikolov T (2016) Enriching word vectors with subword information. CoRR arXiv:abs/1607.04606

  13. Boles DB, Adair LP (2001) The multiple resources questionnaire (MRQ). Proc Hum Fact Ergon Soci Ann Meet 45(25):1790–1794

    Article  Google Scholar 

  14. Bouckaert RR (2005) Bayesian network classifiers in Weka. Accessed 23 June 2020

  15. Boughorbel S, Jarray F, El-Anbari M (2017) Optimal classifier for imbalanced data using Matthews correlation coefficient metric. PLoS ONE 12(6):E0177678

    Article  Google Scholar 

  16. Branting LK (2000a) An advisory system for pro se protection order applicants. Int Rev Law Comput Technol 14(3):357–369

    Article  Google Scholar 

  17. Branting LK (2000b) Reasoning with rules and precedents: a computational model of legal analysis. Kluwer, Dordrect

    Google Scholar 

  18. Branting LK, Yeh A, Weiss B, Merkhofer EM, Brown B (2017) Inducing predictive models for decision support in administrative adjudication. In: AI approaches to the complexity of legal systems—AICOL international workshops 2015–2017, revised selected papers. Lecture notes in computer science, vol. 10791. Springer, Berlin, pp 465–477

  19. Brooke J (1996) SUS—a quick and dirty usability scale. Usab Eval Ind 189(194):4–7

    Google Scholar 

  20. Brüninghaus S, Ashley KD (1999) Toward adding knowledge to learning algorithms for indexing legal cases. In: Proceedings of the 7th international conference on artificial intelligence and law, ICAIL’99. ACM, New York, pp 9–17.

  21. Bruninghaus S, Ashley KD (2003) Predicting outcomes of case based legal arguments. In: Proceedings of the 9th international conference on artificial intelligence and law, ICAIL’03. ACM, New York, pp 233–242

  22. Chalkidis I, Androutsopoulos I, Aletras N (2019) Neural legal judgment prediction in English. CoRR arXiv:1906.02059

  23. Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, KDD’16. ACM, New York, pp 785–794

  24. Dunn PH (2003) How judges overrule: speech act theory and the doctrine of stare decisis. Yale Law J 113(2):493–532

    Article  Google Scholar 

  25. Ferro L, Aberdeen J, Branting K, Pfeifer C, Yeh A, Chakraborty A (2019) Scalable methods for annotating legal-decision corpora. In: Proceedings of the natural legal language processing workshop 2019. Association for Computational Linguistics, Minneapolis, pp 12–20

  26. Gunning D (2018) Defense advanced research projects agency (DARPA) program information: explainable artificial intelligence (XAI). Last visited Dec 26, 2018

  27. Hadfield G (2016) Rules for a flat world: why humans invented law and how to reinvent it for a complex global economy. Oxford University Press, Oxford

    Google Scholar 

  28. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The WEKA data mining software: an update. SIGKDD Explor 11(1):10–18

    Article  Google Scholar 

  29. Henschen B (2018) Judging in a mismatch: the ethical challenges of pro se litigation. Public Integr 20(1):34–46

    Article  Google Scholar 

  30. Herrera F, Charte F, Rivera A, del Jesus M (2016) Multilabel classification: problem analysis, metrics and techniques. Springer, Berlin

    Google Scholar 

  31. Hill F, Cho K, Korhonen A (2016) Learning distributed representations of sentences from unlabelled data. In: 2016 conference of the North American chapter of the association for computational linguistics, pp 1367–1377. Association for Computational Linguistics (ACL). 15th Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT 2016; Conference date: 12-06-2016 Through 17-06-2016

  32. Katz DM, Bommarito MJ II, Blackman J (2017) A general approach for predicting the behavior of the supreme court of the united states. PLoS ONE 12(4):e0174698

    Article  Google Scholar 

  33. Lauritsen M, Steenhuis Q (2019) Substantive legal software quality: a gathering storm? In: Proceedings of the 17th international conference on artificial intelligence and law, ICAIL’19. ACM, New York, pp 52–62

  34. Lawrence J, Reed C (2020) Argument mining: a survey. Comput Linguist 45(4):765–818

    Article  Google Scholar 

  35. Lippi M, Torroni P (2016) Argumentation mining: state of the art and emerging trends. ACM Trans Int Technol 16(2):10:1–10:25

    Article  Google Scholar 

  36. Maxwell KT, Oberlander J, Lavrenko V (2009) Evaluation of semantic events for legal case retrieval. In: Proceedings of the WSDM’09 workshop on exploiting semantic annotations in information retrieval, ESAIR’09. ACM, New York, pp 39–41.

  37. McCarty LT (2018) Research handbook on the law of artificial intelligence, Chap. Finding the right balance in artificial intelligence and law. Edward Elgar Publishing

  38. Medvedeva M, Vols M, Wieling M (2020) Using machine learning to predict decisions of the European court of human rights. Artif Intell Law 28(2):237–266

    Article  Google Scholar 

  39. Mikolov T, Yih SWT, Zweig G (2013) Linguistic regularities in continuous space word representations. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies (NAACL-HLT-2013). Association for Computational Linguistics, New York

  40. Peterson M, Waterman D (1985) Rule-based models of legal expertise. In: Walters C (ed) Computing power and legal reasoning. West Publishing Company, Minneapolis, pp 627–659

    Google Scholar 

  41. Platt JC (1999) Fast training of support vector machines using sequential minimal optimization. In: Schölkopf B, Burges CJC, Smola AJ (eds) Advances in kernel methods. MIT Press, Cambridge, pp 185–208

    Google Scholar 

  42. Ren Y, Fei H, Peng Q (2018) Detecting the scope of negation and speculation in biomedical texts by using recursive neural network. In: 2018 IEEE international conference on bioinformatics and biomedicine (BIBM), pp 739–742

  43. Rissland EL, Skalak DB (1989) Combining case-based and rule-based reasoning: a heuristic approach. In: 11th international joint conference on artificial intelligence, Detroit, pp 524–530

  44. Rissland EL, Ashley KD, Branting LK (2005) Case-based reasoning and law. Knowl Eng Rev 20(3):293–298.

    Article  Google Scholar 

  45. Rush AM, Chopra S, Weston J (2015) A neural attention model for abstractive sentence summarization. CoRR arXiv:1509.00685

  46. Sauro J, Dumas JS (2009) Comparison of three one-question, post-task usability questionnaires. In: Proceedings of the SIGCHI conference on human factors in computing systems, CHI’09, pp 1599–1608

  47. Sergot MJ, Sadri F, Kowalski RA, Kriwaczek F, Hammond P, Cory HT (1986) The British Nationality Act as a logic program. Commun ACM 29(5):370–386.

    Article  Google Scholar 

  48. Shulayeva O, Siddharthan A, Wyner A (2017) Recognizing cited facts and principles in legal judgements. Artif Intell Law 25(1):107–126

    Article  Google Scholar 

  49. Sulea O, Zampieri M, Vela M, van Genabith J (2017) Predicting the law area and decisions of French supreme court cases. In: RANLP. INCOMA Ltd, pp 716–722

  50. Surdeanu M, Nallapati R, Gregory G, Walker J, Manning C (2011) Risk analysis for intellectual property litigation. In: Proceedings of the 13th international conference on artificial intelligence and law. ACM, Pittsburgh

  51. The EXplainable AI in Law (XAILA) (2018) 2018 workshop, Groningen.

  52. Westermann H, Walker VR, Ashley KD, Benyekhlef K (2019) Using factors to predict and analyze landlord-tenant decisions to increase access to justice. In: Proceedings of the 17th international conference on artificial intelligence and law, ICAIL’19. Association for Computing Machinery, New York, pp 133–142

  53. Wyner AZ, Peters W (2010) Lexical semantics and expert legal knowledge towards the identification of legal case factors. In: JURIX, frontiers in artificial intelligence and applications, vol 223. IOS Press, New York, pp 127–136

  54. Yang Z, Yang D, Dyer C, He X, Smola A, Hovy E (2016) Hierarchical attention networks for document classification. In: Proceedings of NAACL-HLT, pp 1480–1489

  55. Yu H, Hatzivassiloglou V (2003) Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences. In: Proceedings of the 2003 conference on empirical methods in natural language processing, EMNLP’03. Association for Computational Linguistics, Stroudsburg, pp 129–136

Download references


The MITRE Corporation is a not-for-profit company, chartered in the public interest. This document is approved for Public Release; Distribution Unlimited. Case No. 19-3739. \(\copyright\) 2019 The MITRE Corporation. All rights reserved.

Author information



Corresponding author

Correspondence to L. Karl Branting.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Annotation

Appendix: Annotation

A key goal of SCALE is a methodology that permits development of explainable legal prediction systems by agencies that lack the resources to engineer domain-specific feature sets, a process that requires both extensive expertise in the particular legal domain and experience in feature engineering. Instead, SCALE requires only the linguistic skills necessary to annotate the decision portion of representative subset of cases, a much more limited process.

Our annotation schema for WIPO decisions consists of three layers: Argument Elements, Issues, and Factors (sub-issues).

Tags are applied to clauses and sentences, as opposed to shorter units such as noun phrases, in order to identify the complete linguistic proposition corresponding to the annotation label. The MITRE Annotation Toolkit (MAT)Footnote 13 was used to perform the annotation.

Argument elements

Although our approach to predictive-text identification is to leverage the Factual Findings and Legal Findings, the annotation schema is designed to capture the full range of argument elements present in cases. These argument elements are as follows:

  1. 1.


  2. 2.


  3. 3.

    Factual Finding

  4. 4.

    Legal Finding

  5. 5.

    Case Rule

  6. 6.


We have found that with these six argument elements, the majority of sentences within the “Findings” and “Decision” sections of WIPO cases can be assigned an argument element label. These argument elements are not specific to WIPO decisions and should be applicable in other domains.


Each Argument Element tag is assigned an Issue. The Issue tags include the three required elements that the complainant must establish in order to prevail in a WIPO case. These issues, which are documented in the Uniform Domain Name Dispute Resolution Policy, paragraph 4,Footnote 14 form the backbone of every decision:

  • (1) ICS: Domain name is Identical or Confusingly Similar to a trademark or service mark in which the complainant has rights

  • (2) NRLI: Respondent has No Rights or Legitimate Interests in respect of the domain name

  • (3) Bad Faith: Domain name has been registered and is being used in Bad Faith

For element (2), NRLI, although the dispute is typically approached from the point of view of the complainant demonstrating that the respondent has NRLI, it is very often the case that the panel considers the rights or legitimate interests of the complainant and/or the respondent. In that case, RLI is available as an Issue tag. In addition, the domain name resolution procedure allows for situations in which the complainant abuses the process by filing the complaint in bad faith (CIBF).Footnote 15

The schema thus consists of five Issue tags, plus an Other category:

  • ICS

  • NRLI

  • RLI

  • BadFaith

  • CIBF

  • OTHER.


In our annotation scheme factors are the elements which we hypothesize will prove most useful for explainable legal prediction. The factors and corresponding tags are specific to the WIPO issues. For ICS, the ICANN policy does not explicitly identify specific factors that will be considered by the panel, so our tag set for ICS is derived from factual findings commonly observed in the data, such as CownsTM (Complainant owns Trademark) and TMentire (Trademark is contained in its entirety within the Domain Name). For NRLI/RLI, the policy establishes three factors, and for Bad Faith, four factors. Each of these has a corresponding tag. For example, under NRLI there is PriorBizUse from 4(c)(i) of the policy (“Bona fide business use of Domain Name or demonstrable preparations to do so, prior to notice of the dispute”) and under BadFaith there is Confusion4CommGain from 4(b)(iv) of the policy “For commercial gain from confusion with complainant’s mark”). The tag set also includes labels for other common factors observed in the data, such as PrimaFacieEst (Prima Facie Case Established). For CIBF, two factor tags are available: RDNH (Reverse Domain Name Hijacking) and Harass (complaint brought primarily to harass DN holder).

Each level of annotation also has an “Other” option to be used when none of the predefined tags is appropriate, and there is a free-form Comment field which the annotator can use to capture ad hoc labels and enter notes.

Fig. 5

Four text spans annotated with factual and legal-findings features


A Citation attribute is used to capture the paragraph citation of Policy and Case Rule argument elements. A polarity attribute is used to capture positive/negative values for issues and factors. Figure 5 shows four typical annotations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Branting, L.K., Pfeifer, C., Brown, B. et al. Scalable and explainable legal prediction. Artif Intell Law (2020).

Download citation


  • Artificial intelligence and law
  • Machine learning
  • Human language technology
  • Explainable prediction