1 Introduction

Judicial decision-making is characterized by the application by courts of authoritative rules to the stylized presentation of disputed claims between competing litigants. These authoritative rules are set forth in legal source materials such as constitutions, statutes, and written opinions supporting prior decisions. For a legal source to have bearing on a current dispute, it must be retrievable by the relevant legal actors. The problem of organizing legal texts into a comprehensible whole has been recognized since Justinian I’s Corpus Juris Civilis issued in 529–534. The acute problems of identifying relevant legal sources (i.e., legal precedent) presented by the common law tradition has spurred codification and classification efforts that have ranged from Blackstone’s “Commentaries on the Laws of England (1765–1769)” to the codification movement in the late nineteenth century (Garoupa and Morriss 2012), to the development and spread of the West American Digest System in the twentieth century (West 1909). Most recently, the effect of digitization on the evolution of the law, primarily in its impact on legal research, has become a subject of inquiry (see e.g., Berring 1986, 1987; Fronk 2010; Hanson and Allan 2002; Hellyer 2005; Katsh 1993; McGinnis and Wasick 2015; Schauer and Wise 2000).

In this paper we consider the textual corpus of legal sources as an evolving landscape that carries a natural geometry and comprises regions of the law whose development and shifting boundaries are influenced by the dynamics and feedback of law search. Everything devolves from a model of the process of legal research carried out in the corpus in which “actors” start from a case or opinion and then build out an understanding of the relevant issues by (1) following citations, (2) searching for cases that cite the initial case of interest, and (3) identifying textually similar cases. These actions have a natural network—more precisely, a multinetwork—formulation, in which legal sources are connected to each other based on citation information and textual similarity as described by a topic model representation of their textual content. Topic models represent texts (embodied as word-frequency distributions or “bag-of-words” representations) as mixtures of topics. “Topic” as used in this sense has a technical meaning and is defined as a probability distribution over the vocabulary in the corpus. Topics are uncovered and discovered according to a well-known and by now widely deployed methodology (see e.g., Blei 2012) that we briefly describe below. Our use of three kinds of connectivity (as opposed to one) in the text corpus structures the corpus in a multinetwork representation, a combinatorial structure that has proved useful in a number of different contexts, such as biology and economics (e.g., Barigozzi et al. 2011; Blinov et al. 2012; Kivelä et al. 2014). In this work we introduce for the first time the multinetwork concept to the novel contexts of text-mining and text search, with a specific application to judicial texts.

We use the multinetwork framework to define a notion of search generalizing the Markov model (discrete time random walk) that encodes Google’s famous “websurfer” webpage search model (Brin and Page 1998). The webpage ranking system Pagerank is simply the stationary vector of this model (Bryan and Leise 2006). Rankings are of course useful (and of course profitable), but the random walk also will give rise to a natural notion of distance on the underlying state space, roughly defined in terms of the expected time (number of steps) needed to go from one state to another and it is this metric point of view that we explore herein. In our setting, distance reflects the ease with which a human user of the legal corpus could navigate from one legal source to another, based on a weighted combination of searches along the underlying citation and topical similarity networks. The latter is usually reduced to a keyword search in standard resources (e.g., through a commercial database such as Lexis-Nexis). The derived inter-opinion distances support the discovery of well-defined regions (in this case, groups of legal sources) that are relatively close to each other, but relatively distant from other regions. Distance is also a proxy for relevance. When new judicial decisions are issued and the supporting opinions are incorporated into the legal corpus, they interact with search technology to change the legal sources that will be discovered during the next search. For example, some new opinions can link together previously distant opinions, making them more easily discoverable. In turn, these new connections can foster new arguments. This is a new kind of legal effect that, as far as we know, has never been identified as a theoretical possibility, much less formalized and subjected to an empirical test.

The random walk setting also enables the creation/definition of a notion of curvature for the underlying state space (think of a state space as the cities and towns in a landscape of rolling hills and valleys). As per the usual interpretation of this geometric notion, the more negative the curvature of a regionFootnote 1 of the legal landscape, the easier it is to navigate to legal sources outside that region from legal sources that are inside of the region. Curvature may change over time as new legal sources are added to the corpus. An increase in curvature in a given regionFootnote 2 indicates increasing difficulty in navigating from the interior of the region to legal sources outside it. This has the interpretation that the region has become more isolated from the rest of the legal corpus and thus is less relevant to new opinions outside of the region. We refer to this effect as puddling. The opposite effect wherein curvature decreases is referred to as drainage. Drainage is characterized by ease of navigation from points (legal sources) inside the region to those that are outside. Notions of network curvature have only just begun to make their way into applied literature. Some early work has adapted the idea of Ricci curvature to the network setting, mainly for its relation to various isoperimetric inequalities (see e.g., Chung and Yau 1996; Lin and Yau 2010). More recent work approaches the idea from the point of view of optimal transport (Ollivier 2009). This in turn makes strong connections to discrete Markov chains—as does ours—but this other work is quite different from the approach taken herein.

Use of the citation network to measure the influence of judicial opinions is now well-studied (see e.g., Bommarito et al. 2009; Fowler and Jeon 2008; Fowler et al. 2007), although interesting potential avenues of this kind of investigation in the judicial context remain underexplored (see e.g. Uzzi et al. 2013 for a citation network analysis in the context of scientific articles). Topic models, however, have only just very recently entered legal studies and have already showed great promise as a foundation for new quantitative avenues of analysis (George et al. 2014; Livermore et al. 2017; Nardi and Moe 2014; Rice 2012).

Citation networks and topic modeling are examples of computational methods useful to legal studies. Early conversations concerning law and digitization focused on distinction in “context” between digital and physical forms, for example, whether digitization enhanced or reduced reading comprehension or facilitated or undermined serendipity in conducting searches. In particular, the legal significance of the effects of various search modalities (citation-based, keyword, unstructured text) are only just becoming apparent (see e.g. McGinnis and Wasick 2015). Our work may suggest ways to begin to quantify some of these effects and empirical studies comparing our search model with actual human search results is in preparation. In this paper we focus on the collection of all U.S. Supreme Court cases from 1951 to 2002. A project to extend our work to include the Circuit courts is already underway.

In the next section we explain in a bit more detail the mathematical background and framework. Section 3 presents our results, showing that the precise notions of puddling and drainage correspond to a measurable waning and waxing respectively of relevance over time. We also briefly introduce the publicly accessible database and user interface (www.bendingthelaw.org) that we have constructed for the engagement with and visualization of the multinetwork of opinions. We then conclude with some thoughts about next steps and extensions of this work. Two technical appendices provides a more detailed mathematical justification (based on Riemannian geometry) for our definition of multinetwork curvature as well as motivation for a certain parameter choice in the analysis. The paper can be read without these sections, but we include them for the sake of completeness.

2 The mathematical framework

2.1 A random walk model for legal research

The geometry we construct for the legal corpus is based on an encoding of the corpus as a multinetwork that supports the legal search process. We frame legal search in this setting as a probabilistic process of “local” exploration of the opinion corpus modeling the way in which a user of the legal corpus might navigate from opinion to opinion while researching an issue. This navigation is naturally viewed as a Markov chain (see e.g., Grinstead and Snell 1997), formulated as a matrix T of transition probabilities where the states are indexed by the opinions: given opinions a and b the value of the entry T(ab) is the probability of “moving to” opinion b “from” opinion a in an exploration of the legal corpus.Footnote 3 More precisely, framing this as a “random walk” in “opinion space” T(ab) is the probability of moving at the next step to case b, given that you are currently at case a, i.e., the conditional probability

$$\begin{aligned} T(a, b) = P(b | a), \end{aligned}$$

in standard notation.

The transition probabilities are constructed as a combination of a several terms, reflecting our stylized model of navigation of the space of legal opinions.Footnote 4 We assume the possibility of three basic types of local exploration from an initial opinion a: (1) consideration of opinions cited by a; (2) consideration of opinions that cite to a, and (3) consideration of opinions that are textually similar to a. Our Markov chain (transition matrix) is thus represented as a linear combination of the individual chains, \(T_{\text{ cited-by }}, T_{\text{ cited }},\) and \(T_{\text{ sim }}\).

We allow for the possibility that an exploratory mode (i.e., the weights given to the three forms of connection in the network) can vary for any given search. It may depend on the searcher, where he/she is in the overall research process, and the current opinion. The last of these is the easiest to embody and in this case the overall chain can be written as

$$\begin{aligned} \begin{array}{l} T(a,b) = p_{\text{ cited }}(a)T_{\text{ cited }}(a,b) + p_{\text{ cited-by }}(a)T_{\text{ cited-by }} (a,b) \\ \qquad \qquad + \;\; p_{\text{ sim }}(a)T_{\text{ sim }}(a,b) \end{array} \end{aligned}$$
(1)

with the proviso that \(p_{\text{ cited }}(a) \ge 0\), \(p_{\text{ cited-by }}(a) \ge 0\), \(p_{\text{ sim }}(a) \ge 0\), and

$$\begin{aligned} p_{\text{ cited }}(a) + p_{\text{ cited-by }}(a) + p_{\text{ sim }}(a)=1 \end{aligned}$$

at each state a. As per the notation, the weights may vary by initial state (a), though in what follows we will typically have them globally constant. In fact, for the sake of analysis we will assume these weights are uniform (each equal to \(\frac{1}{3}\)). Our implementation allows the weights to vary (cf. Sect. 3.2). In general, throughout this paper, we typically choose our parameters to be simple natural choices, reflecting the initiatory nature of this paper and the early stages of this project. Any particular parameter of groups of parameters could be optimized with more data and an appropriate training paradigm. Ideally, the weights would be determined by training them with respect to an appropriate objective function, and the ideal objective function would be related to the effectiveness of the exploration. This would require feedback from users, and in Sect. 3.2 we discuss an implementation which could eventually allow for such a training paradigm to be implemented.

2.2 Construction of the components \(T_{\text{cited}} ,T_{\text{cited-by}} ,\hbox {and}\;T_{\text{sim}}\)

The transition matrices \(T_{\text{ cited }}\) and \(T_{\text{ cited-by }}\), based on the citation network are straightforward to construct. A natural and standard choice is to weight equally all opinions cited by a given opinion, and similarly for all opinions that cite the given opinion. Thus, if opinion a cites opinions \(b_1,\dots ,b_k\) then \(T_{\text{ cited }}(a,b_i) = {1\over k}\). Similarly, if a is cited by opinions \(b_1,\dots ,b_k\), then \(T_{\text{ cited-by }}(a,b_i) = {1\over k}.\) While we choose to work with equal weights, this weighting could be modified in some way, perhaps accounting for some notion of the importance of an opinion. To find the citation network we make use of the excellent “Supreme Court Citation Network Data” database created by Fowler and Jeon (cf. Supreme Court 2015).

Navigation via textual similarity using something deeper than keywords is a novel contribution of this work and for this we make use of a topic model. A detailed description of topic modeling is beyond the scope of this paper, but a short description will suffice for the purposes of exposition. Very briefly, a topic—in the technical sense—is a probability distribution over a vocabulary. Topic modeling is the unsupervised derivation of a set of such distributions that represents a text corpus of documents (technically defined as a roughly contiguous set of words in the corpus, that is usually itself composed of larger portions of text—e.g., full opinions as opposed to the word blocks it comprises). Topics are defined according to a simple generative bag-of-words modelFootnote 5 for the documents in the corpus: given a document, first a topic is chosen at random and then a word is chosen at random within the topic. The topics are then the best fit solution to the actual bag-of-words representation of the documents. Recalling that bag-of-words is essentially a representation of each document as a word distribution, the topic model derives the “atomic” probability distributions that express each document in the corpus as a mixture of such atoms. The wide applicability of topic models in many disciplines has made for a broad community of topic modelers and the topic modeling technology has quickly become an “off-the-shelf” technology ready for deployment (see e.g., MALLET 2015) with a minimum of start-up cost. See Blei (2012) for one of the many friendly explanations of topic modeling.

The only supervision in the basic topic modeling algorithm is the choice of number of topics to be computed. We choose to use 100 topics, which for our corpus of 21,893 opinions (documents) is adequate. The most widely discussed method for choosing the number of topics involves treating the number of topics as a model parameter and inferring it from the data (Griffiths and Steyvers 2004). This method requires, however, more computational resources than are typically available as resources needed increase rapidly with the number of topics allowed. With such a large corpus of (long) documents, for example, fitting a corpus with 1000 topics is not possible in a reasonable amount of time. The approach we adopt—and we think it reflects the current best practice—is to choose a maximum number of topics based on time and computational resources available. Picking a larger number of topics than the data supports is not a risk because the widely used specifications of the topic model [used by MALLET (2015) and in the software we use Buntine and Mishra (2014)] will simply leave them empty. For example, if the data suggest that 50 topic distributions is sufficient to account for the data, fitting a model with a maximum of 100 topics will recover the same model as fitting the model with a maximum of 50 topics.

When the topic modeling is completed we therefore have a set of topics \(\text{ Topic}_{1}, \dots , \text{ Topic}_{100}\), where each word w in the vocabulary has a weight in each topic \(\text{ Topic}_{k}(w) \ge 0\) and any given opinion a is represented as a distribution over topics, \(\sum _k \alpha _k(a) {\text{ Topic}}_k \; \left(\sum _k\alpha _k(a) = 1; \;\; \alpha _k(a) \ge 0 \right)\). Table 1 shows the most highly weighted words in five of the topics. The indexing of the topics in the table is not relevant. The labels (in parentheses) are assigned by the user (in this case the authors of this paper). The full set of topics for our SCOTUS dataset is available online.Footnote 6

While there are a number of different kinds of topic models (see e.g., Blei 2007; Blei and Lafferty 2006; Roberts et al. 2013), the “latent Dirichlet allocation” (LDA) model (the “Dirichlet” refers to an underlying assumption of a Dirichlet distribution in the model) is perhaps the best known and most widely used (Blei et al. 2003). This is the topic model that we use here.

Table 1 Some representative topics derived from the SCOTUS corpus

With the topic modeling accomplished, we are now in a position to construct \(T_{\text{ sim }}\). For this, we only consider as relevant to a given opinion the “top” topics and similarly for a given topic, only consider as relevant to our exploration those opinions who express it most strongly. More precisely, we fix integer parameters \(N_\mathcal{{T}}\) and \(N_\mathcal{{O}}\) and for a given opinion a identify the \(N_\mathcal{{T}}\) most heavily weighted topics expressed in opinion a (using the \(\alpha _k(a)\) to define the weight) and for a given topic \({\text{ Topic}}_k\) identify the \(N_\mathcal{{O}}\) opinions in which \({\text{ Topic}}_k\) was most strongly expressed (using the \(\alpha _k\) here as well).Footnote 7 Intuitively we view this as the process of a search returning the top \(N_\mathcal{{T}}\) topics related to the initial opinion a followed by a search of the top \(N_\mathcal{{O}}\) opinions associated to each of these top topics. To weight the final results of the search, for the given opinion a we create an \(N_\mathcal{{T}} \times N_\mathcal{{O}}\) matrix in which the ij entry is the index of the jth most significant opinion in the corpus for the ith most significant topic in opinion a. If we define \(W_{a,b}\) to be the number of times opinion b occurs in this matrix, then \(T_{\text{ sim }}\) is the random walk produced by normalizing according to these weights. More precisely, for any b with \(W_{a,b} > 0\),

$$\begin{aligned} T_{\text{ sim }}(a,b) = {{W_{a,b}}\over {\sum _c W_{a,c}}}. \end{aligned}$$

With this we have now defined each component random walk for our cumulative walk T.

2.3 The exploration geometry

The cumulative Markov chain

$$\begin{aligned} T = p_{\text{ cited }} T_{\text{ cited }} + p_{\text{ cited-by }} T_{\text{ cited-by }} + p_{\text{ sim }} T_{\text{ sim }} \end{aligned}$$

is a natural generalization of the random walk (the “random surfer”) whose equilibrium distribution is the source of the original PageRank algorithm underlying the early implementation of the Google search engine (Brin and Page 1998). Of interest to us is the geometry that this search model (or any random walk) produces. In particular, this kind of Markov-based search produces a metric on the network space that we call PageDist.Footnote 8 We call the induced geometry an exploration geometry.Footnote 9

To define PageDist we attach one last parameter r to the random walk of (1): at each step assume a probability \(r > 0\) of continuing the exploration. Then given r and starting at an opinion a, the expected number of visits to opinion b is

$$\begin{aligned} R(a,b) = \sum _{k=0}^{\infty } r^k T^k(a,b) \end{aligned}$$
(2)

where as per usual, \(T^k(a,b)\) is the probability of transition from a to b in k steps. Intuitively, \(R(a,\cdot )\) forms an exploration neighborhood of opinion a in the sense that the higher the value of R(ab) the more opinion b is considered to be in a neighborhood of a. Notice, r governs the size of this neighborhood as a sort of radius. If \(r=0\) then the neighborhood consist of only the opinion a, while if \(r=1\) (and the chain is irreducible) then the series diverges everywhere and the whole space is a’s exploration neighborhood. So we need a value between 0 and 1 and in what follows we chose \(r=\frac{1}{2}\) to keep it simple. As discussed above, with a fixed objective function and enough training data one could could optimize this choice of r (perhaps even locally).

By comparing the overlap of the neighborhoods defined by different opinions we can form a metric PageDist, given by

$$\begin{aligned} \text{ PageDist }(a,b) = \left| \left| R(a,\cdot ) - R(b,\cdot ) \right| \right| _{p} \end{aligned}$$
(3)

where p denotes the p-norm.Footnote 10 Notice that if the neighborhood description of a and b nearly agree then this will be near zero, and if they are very distant R(ax) will be nearly zero when R(bx) is large and vice versa, resulting in a large value of \(\text{ PageDist }(a,b)\) (in other words, a large distance between the opinions). So the PageDist metric will capture a notion of distance within the landscape. Figure 1 shows the distribution of distances among our corpus of Supreme Court opinions. In what follows, we chose the Euclidean norm (\(p = 2\)) to keep it simple. Again, with a fixed objective function and enough training data the choice of p could also be optimized.

Fig. 1
figure 1

Here we see a histogram of the PageDist values when computed on the legal corpus. Each sample is a pair of points (opinions) in the corpus and this histogram includes all distinct pairs of points. The modes in the histogram correspond to pairs in distinct regions. We choose \(p=2\), \(r=\frac{1}{2}\), and \(N_\mathcal{{T}}= N_\mathcal{{O}}=10\)

The random walk setting also makes possible a definition of curvature that encodes a level of difficulty for escape from a given point in the execution of a random walk. If the degree of difficulty is large, a walk will have a tendency to get “stuck” in the neighborhood of the state. This can be interpreted as an opinion that doesn’t connect usefully with its surrounding or nearby opinions. Conversely, a more “fluid” area around an opinion suggests that it engages usefully with the broader opinion landscape. This kind of idea will be key to understanding the relevance of an opinion.

This ability to “escape” from a region while random walking is a problem studied widely in a variety of mathematical contexts. We take inspiration from the study of Brownian motion on manifolds and from this define for the random walk on the network a local notion of curvature as

$$\begin{aligned} \kappa (a) = \log (R(a,a)-1). \end{aligned}$$
(4)

In "Appendix A" we explain from a technical point of view why this corresponds to scalar curvature from Riemannian geometry and thus provides a natural definition.

For us the key is that as the network evolves a measure of change in the local connectivity of the opinions can be expressed in terms of changing \(\kappa\). We think of this change as a measure of how the network is bending. Let us make this precise. Given the node set N of a network with a transition matrix T reflecting a Markov process on the nodes, let \(S \subset N\), be some subset of nodes. A Markov chain on N induces a chain on the subset S by using the weights

$$\begin{aligned} W_S(a,b) = P(a,b) + \sum _{k \in N\backslash S, \;\; a \ne b} P(a,k) P(k,b), \end{aligned}$$

for \(a, b\in S\). Note that we are simply lumping together into one term all transitions a to b that go outside of S. We form a new transition matrix P(abSN) normalizing \(W_S(a,b)\) so that the weights sum to one at each vertex. We call this the induced local exploration. This induces a corresponding exploration geometry and a curvature \(\kappa\) (defined as in (3,4)) for S relative to N which we denote as \(\kappa (a; S,N)\). Bending will encode the change in curvature as S grows.

Consider the network at two different time points \(t_0 < t_1\) with corresponding node sets \(N_0\) and \(N_1\). Since the opinion corpus only grows in time, \(N_0 \subseteq N_1\). Then we can quantify a change in the induced exploration geometry as

$$\begin{aligned} \text{ Bending }(N_1,N_0)(a) = \kappa (a; N_0,N_1) - \kappa (a; N_0,N_0) \end{aligned}$$
(5)

where \(\kappa (a; N_0,N_0) = \kappa (a)\) in the network at time \(t_0\). Identifying the network with the timestamp we might also write

$$\begin{aligned} \text{ Bending }(a; t_1> t_0) = \kappa (a; t_1 > t_0) - \kappa (a; t_0). \end{aligned}$$
(6)

Bending is easy to interpret, it indicates whether the induced geometry at a point evolves in such a way that it became easier or more difficult to escape from the point. Regions where it becomes more difficult to make such transitions we call puddling regions and regions where it becomes easier are called drainage regions. A precise definition works with the distribution of bending values: we call the subset corresponding to the bottom quartile of \(\text{ Bending }(*; t_1, t_0)\) the Drainage region (relative to the defining era)—or Drainage\((t_1, t_0)\). Similarly, we call the subset corresponding to the top quartile of \(\text{ Bending }(*; t_1, t_0)\) the Puddling region (relative to the defining era)—or Puddling\((t_1, t_0)\). Figure 2 shows the distribution of \(\kappa (*; 1990)\) as well as the bending of 1995 relative to 1990 in the Supreme Court opinion corpus (\(Bending(*; 1995 > 1990)\)).

Fig. 2
figure 2

On the left we see a histogram of the the curvature \(\kappa (*; 1990)\) computed on the corpus at 1990, and on the right we see the bending \(\text{ Bending }(*; 1995 > 1990)\). This gives a sense of the variation of the curvature over time. Notice, the curvature histogram on the left is far from uniform. The bending histogram on the right is very telling. The right tail is the Puddling region and the left tail the Drainage region

3 Results

The metrics we have developed enable us to determine the “relevance” of an opinion, as defined by its proximity to new opinions that are added to the corpus.

3.1 Metrics for relevance

To exhibit the utility of our various definitions we first quantify what it means for a case to be “relevant”. Our proxy is that it is nearby (in terms of PageDist). Thus, let \(N_t\) denote the set of nodes (opinions) in the network (corpus) at time t. Given \(t_2 \ge t_1 \ge t_0\), define the set of relevant cases (at some threshold d) as

$$\begin{aligned} \begin{array}{l} \text{ Rel }_{t_2, t_1,t_0; d} \!=\! \{ a \in N_{t_0} \!\mid \! \text{ PageDist }(a,b) < d, \text{ for } \text{ some } \text{ b } \in N_{t_2} - N_{t_1} \}. \end{array} \end{aligned}$$
(7)

This set (with these parameter values) comprises the “early” opinions a at time \(t_0\) (i.e., those that could serve as precedent) that find themselves close to newly arrived (later) opinions (those issued in the period between \(t_1\) and \(t_2\)). This means that the opinions in \(\text{ Rel }_{t_2, t_1,t_0; d}\) are those opinions published no later than \(t_0\) that are close to the new opinions published between times \(t_1\) and \(t_2\).

The threshold d can be set based on various criteria. A natural way to set d is by taking into account the PageDist distribution. A guiding principle is to set d according to the percentage of cases that we want to declare as “relevant” over a given initial or baseline period. For fixed time periods \(t_0< t_1<t_2\), as the threshold d increases, so does the fraction of opinions in the corpus at time \(t_0\) that are considered relevant. Conversely, as the fraction of cases that will be viewed as relevant grows, this implicitly corresponds to an increased threshold d.

We further define the Initial Relevance Probability (IRP) (for \(t_1 > t_0\) and a given threshold d) as the fraction of opinions present at time \(t_0\) that are in \(\text{ Rel }_{t_1, t_0,t_0; d}\)—i.e., the fraction of opinions that remain relevant at time \(t_1\) according to a threshold d. Our goal is to understand how to predict which cases remain relevant as time goes on. Figure 3 shows how IRP varies with relevance to future cases \(P(\text{ Rel }_{t_2,t_1,t_0; d} \mid \text{ Rel }_{t_1,t_0,t_0; d})\).Footnote 11 Therein we plot (using \(t_0=1990\), \(t_1=1995\), and \(t_2 = 2000\))

$$\begin{aligned} \text{ Momentum } = P( \text{ Rel }_{t_2,t_1,t_0; d} \mid \text{ Rel }_{t_1,t_0,t_0; d}) - \text{ IRP } \end{aligned}$$

against IRP (recall that since d increases monotonically with IRP, we can view both axes as functions of d). Thus, “Momentum” measures the fraction of opinions that continue to be relevant. This behaves as might be expected, with an increasing percentage of opinions remaining relevant, until such a time as too many initial cases are tossed in, some of which will be opinions that have become vestigial.

Our goal is to identify the region R which contains the recent legal action. If we imagine that we have constructed a random region with each of our independent samples, then \(P(\text{ Rel }_{t_2,t_1,t_0; d} \mid \text{ Rel }_{t_1,t_0,t_0; d}) \approx IRP\). So the Momentum measures how far beyond random our construction is, and we define the optimally “relevant” region as the one that’s furthest beyond random. Let us now fix \(d = d_{max}\) so as to correspond to the \(IRP=0.2\) in Fig. 3. With the choice of d set, we now have fixed the parameter by which we identify opinions as relevant. A mathematical justification for this choice can be found in "Appendix B".

Fig. 3
figure 3

Here the x-axis is Initial Relevance Probability relative to \(t_0=1990\), \(t_1=1995\), so the fraction of cases before 1990 that are within a distance d of cases that come in after 1990 and before 1995. As d increases so does IRP, so that the x-axis reflects a steady increase in d. Similarly, for the y-axis, Momentum (expressed as a percent) with \(t_0=1990\), \(t_1=1995\), and \(t_2 = 2000\), which is also a function of d. So, the curve we see here is effectively a parametrized plot of Momemtum against IRP as d increases. Recall that Momentum (with these parameters) is the difference between the proportion of early (pre-1990) opinions that continue to be relevant in the 1995-2000 period, given that they were relevant in the 1990-1995 period, and the fraction of opinions that initially were relevant to opinions written between 1990 and 1995. Thus, we are subtracting out some baseline guess of how many of these early cases you would expect to be relevant in this time based on earlier information. This measures how much larger than random the future relevance is given recent relevance. This is all a function of d or equivalently, IRP. We see that IRP \(=0.2\) is roughly an optimal value

Having fixed d we can now examine the interaction between curvature and relevance, and in particular, the effect of being in either the drainage or puddling groups as respects the relevance of future cases. Let us start by defining our Future Relevance Probability relative to a condition A as

$$\begin{aligned} \begin{array}{l} FRP(A) \!=\! P(\text{ Rel }_{t_2,t_1,t_0; d_{max}} \mid \text{ Rel }_{t_1,t_0,t_0; d_{max}} \& A). \end{array} \end{aligned}$$
(8)

This measures how much knowing a condition A helps to predict future relevance. And our goal is to see whether knowing something about the dynamic geometry, namely if we are in a drainage or puddling region, helps us predict whether that regions is more or less likely to be relevant in the near future. This entails the comparison of \(FRP(\text{ Drainage })\), \(FRP(\text{ Puddling })\), and \(FRP(\text{ All })\).

This comparison is shown in Fig. 4. We see the relevance of future cases (the blue line - in the online - and solid line in the paper copy) compared to the relevance of future cases in the drainage and puddling regions. Therein we see that indeed, drainage regions (low bending) have roughly a greater than \(10\%\) chance more of being relevant for future cases than do puddling regions (high bending). That is, the drainage regions that are connecting up the space are more associated to future relevance.

To confirm that this relationship is not driven by random chance, let the null hypothesis be that there is nothing but a random difference between the drainage and puddling regions. So for a fixed measurement, under the null hypothesis there would be a fifty-fifty chance that we confirm our suspicion (technically, bounded by \(50\%\) when allowing for ties). Furthermore, for events that differ by at least 5 years, the \(N_{t_2} \backslash N_{t_1}\) populations are distinct, so that the measurements are suitably independent. Thus, we have 6 independent measurements with a perfect track record which would be expected by chance with a likelihood of \(\frac{1}{2^6}\). The null hypothesis that there is nothing but a random difference between drainage and puddling regions is thus highly unlikely.

Fig. 4
figure 4

Here the x-axis is the year the case was decided, and the y-axis is a probability expressed as a percent. The blue/solid curve is \(FRP(\text{ All })\) with \(t_0=date\), \(t_1=date+5\), and \(t_2 = date+10\). In black/dashed we see \(FRP(\text{ Drainage })\) and in red dot-dashed we see \(FRP(\text{ Puddling })\) with the same timing parameter values. Notice that indeed, the bending is (negatively) correlated with long term relevance as predicted, and that after around 1978 we see a fairly stable 10% difference. (Color figure online)

3.2 Implementation

The ideas presented in this paper form the foundation of new web-based search tool for exploring a space of legal opinions using the exploration geometry introduced in the body of this paper. Specifically, we have built a prototype website and user interface (UI) that will enable the exploration according to PageDist of an opinion database, that ultimately will encompass all Federal Court and Supreme Court cases. At present it is running on a small subset (SC cases 1950–2001). This prototype can be found at www.bendingthelaw.org.

Currently, our UI introduces users to cases in the “vicinity” (in the sense of our exploration geometry) of a pre-identified case specified by the user. The anticipation is that these cases will be strong candidates for precedent-based reasoning. As per (1) the search returns the “neighborhood” of the case that depends on the database of cases as well as the individual weights assigned to the three-component random walk process encoding the exploration geometry—that is, a choice of weights \(p_{\text{ cited }}, p_{\text{ cited-by }},\) and \(p_{\text{ sim }}\). As a first step we allow a choice of weights from \(\{0,1,2\}\) with at least one positive weight, so that \(W = w_{\text{ cited }} + w_{\text{ cited-by }} +w_{\text{ sim }}\), \(p_{\text{ cited }} = w_{\text{ cited }}/{W}\), \(p_{\text{ cited-by }} = w_{\text{ cited-by }}/{W}\), and \(p_{\text{ sim }} = w_{\text{ sim }}/{W}\).

Recall that the similarity piece of the random walk, \(T_{\text{ sim }}\) requires that we construct the “topic by opinion” matrix of a given size. We choose that to be \(10 \times 10\)—i.e., that for any given topic we consider the 10 opinions that make the most use of it and conversely, for any opinion, we consider the 10 topics that make the strongest contribution to it.

Given an initial query, the UI provides two complementary representations: (1) a ranked list of geometrically closest (in terms of PageDist) cases and (2) a map of the space, centered on a case of origin (the original input). As a “map”, this representation shows not only the relation of cases to the initial query, but also the relations of the closest cases to each other. The associated visual integrates a network representation wherein cases are linked if the overall weight between them exceeds a threshold. The map is generated by clicking on “View Case Network” (after executing the query). The opinion map produced from the query “329 US 187: Ballard v. United States” is shown in Fig. 5.

Fig. 5
figure 5

Here is a snapshot from our alpha version UI for exploring the space of legal opinions. The current UI is built on the database of Supreme Court opinions over the time period 1950–2001. What we see here is the 2-d MDS visualization of the PageDist neighborhood of 30 cases closest to “329 US 187: Ballard v. United States”. Cases are linked if the overall weight between them exceeds some threshold. The exploration weights have been set to 2 (“cited”), 1 (“cited by”), and 2 (“topic similarity”)

4 Closing thoughts

In this paper we introduce a new multinetwork framework integrating citation and textual information for encoding relationships between a large set of Supreme Court opinions. The citation component derives from the underlying citation network of opinions. The textual piece derives from an LDA topic model computed from the text corpus. A metric on the opinion space is the reification of a basic model of legal search as would be executed by a prototypical legal researcher (“homo legalus”) looking for cases relevant to some initial case through textual similarity and citation. The model of search is articulated as a Markov chain on the network, built as a linear combination of the individual chains on the citation and topic networks. The Markov process produces a notion of distance between opinions which can also be thought of as a proxy for relevance. Along with distance, the Markov chain gives rise to a notion of curvature, and with this an implicit framing of the opinion corpus as a “landscape” which we call “the legal landscape”. We have implemented a first generation website that will allow users to explore a smallish subset of Supreme Court opinions using this search tool (www.bendingthelaw.org).

The text corpus evolves in the sense that cases enter the corpus regularly and in so doing continually transform the associated text landscape, changing interpoint distances and local curvatures. Of particular interest are those cases that remain relevant over long periods of time. Some regions of the legal landscape have the property that they serve as nexuses of connection for regions of the landscape. We show that those regions which over time become significantly more negatively curved are such connective areas. With the analogy of flow in mind, we call such areas, regions of “drainage”. Areas which experience a significant increase in curvature we call “puddling regions”. We show that drainage areas are more likely to contain continually relevant cases than the puddling regions. We further show that opinions that start off relevant, in the sense of entering the landscape highly relevant to many cases over a short period of time tend to remain relevant, thereby suggesting a property of (legal) momentum.

There are natural next steps to take with this idea. In one direction we will expand the text corpus to include all Supreme Court and Appellate Court Opinions. We also plan to validate and compare our model by asking users to compare the results of our search algorithm (under a range of parameter choices) with their own usual research approaches. Our newly introduced opinion distance function gives a new variable to explore the relations of opinions to all kinds of social and economic variables. It is also natural to export this model to other court systems that produce English language opinions. In this regard it would be interesting to see the ways in which the “bending” of the courts systems vary, and try to understand what might account for such (possible) variation. Ultimately, it would also be of interest to effect the integration of distinct corpora via this model. In a related, but different direction, we will deploy this new navigation and search model on other corpora. To this end, the Bending the Law website includes navigable access to the United States Code (USC), Code of Federal Regulations (FCR), and Internal Revenue Code (IRC). In these corpora, sections and subsections are linked and referenced, and the topic modeling takes place on the level of sections. Future work will describe our findings in analyzing these newly multinetworked corpora, but for now, they exist as domains for new explorations for the public.