Jurimetrics is the application of quantitative methods, and often especially probability and statistics, to law. [1] In the United States, the journal Jurimetrics is published by the American Bar Association and Arizona State University. [2] The Journal of Empirical Legal Studies is another publication that emphasizes the statistical analysis of law.
The term was coined in 1949 by Lee Loevinger in his article "Jurimetrics: The Next Step Forward". [1] [3] Showing the influence of Oliver Wendell Holmes Jr., Loevinger quoted [4] Holmes' celebrated phrase that:
“For the rational study of the law the blackletter man may be the man of the present, but the man of the future is the man of statistics and the master of economics.” [5]
The first work on this topic is attributed to Nicolaus I Bernoulli in his doctoral dissertation De Usu Artis Conjectandi in Jure, written in 1709.
In 2018, California's legislature passed Senate Bill 826, which requires all publicly held corporations based in the state to have a minimum number of women on their board of directors. [34] [35] Boards with five or fewer members must have at least two women, while boards with six or more members must have at least three women.
Using the binomial distribution, we may compute what the probability is of violating the rule laid out in Senate Bill 826 by the number of board members. The probability mass function for the binomial distribution is:
where is the probability of getting successes in trials, and is the binomial coefficient. For this computation, is the probability that a person qualified for board service is female, is the number of female board members, and is the number of board seats. We will assume that . Depending on the number of board members, we are trying compute the cumulative distribution function:
With these formulas, we are able to compute the probability of violating Senate Bill 826 by chance:
3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|
0.50 | 0.31 | 0.19 | 0.34 | 0.23 | 0.14 | 0.09 | 0.05 | 0.03 | 0.02 |
As Ilya Somin points out, [34] a significant percentage of firms - without any history of sex discrimination - could be in violation of the law.
In more male-dominated industries, such as technology, there could be an even greater imbalance. Suppose that instead of parity in general, the probability that a person who is qualified for board service is female is 40%; this is likely to be a high estimate, given the predominance of males in the technology industry. Then the probability of violating Senate Bill 826 by chance may be recomputed as:
3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 |
---|---|---|---|---|---|---|---|---|---|
0.65 | 0.48 | 0.34 | 0.54 | 0.42 | 0.32 | 0.23 | 0.17 | 0.12 | 0.08 |
Bayes' theorem states that, for events and , the conditional probability of occurring, given that has occurred, is:
Using the law of total probability, we may expand the denominator as:
Then Bayes' theorem may be rewritten as:
This may be simplified further by defining the prior odds of event occurring and the likelihood ratio as:
Then the compact form of Bayes' theorem is:
Different values of the posterior probability, based on the prior odds and likelihood ratio, are computed in the following table:
Likelihood Ratio | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Prior Odds | 1 | 2 | 3 | 4 | 5 | 10 | 15 | 20 | 25 | 50 |
0.01 | 0.01 | 0.02 | 0.03 | 0.04 | 0.05 | 0.09 | 0.13 | 0.17 | 0.20 | 0.33 |
0.02 | 0.02 | 0.04 | 0.06 | 0.07 | 0.09 | 0.17 | 0.23 | 0.29 | 0.33 | 0.50 |
0.03 | 0.03 | 0.06 | 0.08 | 0.11 | 0.13 | 0.23 | 0.31 | 0.38 | 0.43 | 0.60 |
0.04 | 0.04 | 0.07 | 0.11 | 0.14 | 0.17 | 0.29 | 0.38 | 0.44 | 0.50 | 0.67 |
0.05 | 0.05 | 0.09 | 0.13 | 0.17 | 0.20 | 0.33 | 0.43 | 0.50 | 0.56 | 0.71 |
0.10 | 0.09 | 0.17 | 0.23 | 0.29 | 0.33 | 0.50 | 0.60 | 0.67 | 0.71 | 0.83 |
0.15 | 0.13 | 0.23 | 0.31 | 0.38 | 0.43 | 0.60 | 0.69 | 0.75 | 0.79 | 0.88 |
0.20 | 0.17 | 0.29 | 0.38 | 0.44 | 0.50 | 0.67 | 0.75 | 0.80 | 0.83 | 0.91 |
0.25 | 0.20 | 0.33 | 0.43 | 0.50 | 0.56 | 0.71 | 0.79 | 0.83 | 0.86 | 0.93 |
0.30 | 0.23 | 0.38 | 0.47 | 0.55 | 0.60 | 0.75 | 0.82 | 0.86 | 0.88 | 0.94 |
If we take to be some criminal behavior and a criminal complaint or accusation, Bayes' theorem allows us to determine the conditional probability of a crime being committed. More sophisticated analyses of evidence can be undertaken with the use of Bayesian networks.
In recent years, there has been a growing interest in the use of screening tests to identify drug users on welfare, potential mass shooters, [36] and terrorists. [37] The efficacy of screening tests can be analyzed using Bayes' theorem.
Suppose that there is some binary screening procedure for an action that identifies a person as testing positive or negative for the action. Bayes' theorem tells us that the conditional probability of taking action , given a positive test result, is:
For any screening test, we must be cognizant of its sensitivity and specificity. The screening test has sensitivity and specificity . The sensitivity and specificity can be analyzed using concepts from the standard theory of statistical hypothesis testing:
Therefore, the form of Bayes' theorem that is pertinent to us is:
Suppose that we have developed a test with sensitivity and specificity of 99%, which is likely to be higher than most real-world tests. We can examine several scenarios to see how well this hypothetical test works:
With these base rates and the hypothetical values of sensitivity and specificity, we may calculate the posterior probability that a positive result indicates the individual will actually engage in each of the actions:
Drug Use | Mass Shooting |
---|---|
0.6012 | 0.0098 |
Even with very high sensitivity and specificity, the screening tests only return posterior probabilities of 60.1% and 0.98% respectively for each action. Under more realistic circumstances, it is likely that screening would prove even less useful than under these hypothetical conditions. The problem with any screening procedure for rare events is that it is very likely to be too imprecise, which will identify too many people of being at risk of engaging in some undesirable action.
The difference between jurimetrics and law and economics is that jurimetrics investigates legal questions from a probabilistic/statistical point of view, while law and economics addresses legal questions using standard microeconomic analysis. A synthesis of these fields is possible through the use of econometrics (statistics for economic analysis) and other quantitative methods to answer relevant legal matters. As an example, the Columbia University scholar Edgardo Buscaglia published several peer-reviewed articles by using a joint jurimetrics and law and economics approach. [39] [40]
In probability theory and statistics, Bayes' theorem, named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the risk of developing health problems is known to increase with age, Bayes' theorem allows the risk to an individual of a known age to be assessed more accurately by conditioning it relative to their age, rather than simply assuming that the individual is typical of the population as a whole.
Bayesian inference is a method of statistical inference in which Bayes' theorem is used to update the probability for a hypothesis as more evidence or information becomes available. Fundamentally, Bayesian inference uses prior knowledge, in the form of a prior distribution in order to estimate posterior probabilities. Bayesian inference is an important technique in statistics, and especially in mathematical statistics. Bayesian updating is particularly important in the dynamic analysis of a sequence of data. Bayesian inference has found application in a wide range of activities, including science, engineering, philosophy, medicine, sport, and law. In the philosophy of decision theory, Bayesian inference is closely related to subjective probability, often called "Bayesian probability".
In probability theory and statistics, the multivariate normal distribution, multivariate Gaussian distribution, or joint normal distribution is a generalization of the one-dimensional (univariate) normal distribution to higher dimensions. One definition is that a random vector is said to be k-variate normally distributed if every linear combination of its k components has a univariate normal distribution. Its importance derives mainly from the multivariate central limit theorem. The multivariate normal distribution is often used to describe, at least approximately, any set of (possibly) correlated real-valued random variables each of which clusters around a mean value.
In probability and statistics, Student's t distribution is a continuous probability distribution that generalizes the standard normal distribution. Like the latter, it is symmetric around zero and bell-shaped.
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.
In probability theory and statistics, a Gaussian process is a stochastic process, such that every finite collection of those random variables has a multivariate normal distribution. The distribution of a Gaussian process is the joint distribution of all those random variables, and as such, it is a distribution over functions with a continuous domain, e.g. time or space.
Bayesian statistics is a theory in the field of statistics based on the Bayesian interpretation of probability where probability expresses a degree of belief in an event. The degree of belief may be based on prior knowledge about the event, such as the results of previous experiments, or on personal beliefs about the event. This differs from a number of other interpretations of probability, such as the frequentist interpretation that views probability as the limit of the relative frequency of an event after many trials. More concretely, analysis in Bayesian methods codifies prior knowledge in the form of a prior distribution.
In probability theory and statistics, the Laplace distribution is a continuous probability distribution named after Pierre-Simon Laplace. It is also sometimes called the double exponential distribution, because it can be thought of as two exponential distributions spliced together along the abscissa, although the term is also sometimes used to refer to the Gumbel distribution. The difference between two independent identically distributed exponential random variables is governed by a Laplace distribution, as is a Brownian motion evaluated at an exponentially distributed random time. Increments of Laplace motion or a variance gamma process evaluated over the time scale also have a Laplace distribution.
The positive and negative predictive values are the proportions of positive and negative results in statistics and diagnostic tests that are true positive and true negative results, respectively. The PPV and NPV describe the performance of a diagnostic test or other statistical measure. A high result can be interpreted as indicating the accuracy of such a statistic. The PPV and NPV are not intrinsic to the test ; they depend also on the prevalence. Both PPV and NPV can be derived using Bayes' theorem.
The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect "mere" correlations, but Clive Granger argued that causality in economics could be tested for by measuring the ability to predict the future values of a time series using prior values of another time series. Since the question of "true causality" is deeply philosophical, and because of the post hoc ergo propter hoc fallacy of assuming that one thing preceding another can be used as a proof of causation, econometricians assert that the Granger test finds only "predictive causality". Using the term "causality" alone is a misnomer, as Granger-causality is better described as "precedence", or, as Granger himself later claimed in 1977, "temporally related". Rather than testing whether Xcauses Y, the Granger causality tests whether X forecastsY.
Stochastic dominance is a partial order between random variables. It is a form of stochastic ordering. The concept arises in decision theory and decision analysis in situations where one gamble can be ranked as superior to another gamble for a broad class of decision-makers. It is based on shared preferences regarding sets of possible outcomes and their associated probabilities. Only limited knowledge of preferences is required for determining dominance. Risk aversion is a factor only in second order stochastic dominance.
In probability theory, Dirichlet processes are a family of stochastic processes whose realizations are probability distributions. In other words, a Dirichlet process is a probability distribution whose range is itself a set of probability distributions. It is often used in Bayesian inference to describe the prior knowledge about the distribution of random variables—how likely it is that the random variables are distributed according to one or another particular distribution.
In economics and game theory, an all-pay auction is an auction in which every bidder must pay regardless of whether they win the prize, which is awarded to the highest bidder as in a conventional auction. As shown by Riley and Samuelson (1981), equilibrium bidding in an all pay auction with private information is revenue equivalent to bidding in a sealed high bid or open ascending price auction.
The Shapley–Folkman lemma is a result in convex geometry that describes the Minkowski addition of sets in a vector space. It is named after mathematicians Lloyd Shapley and Jon Folkman, but was first published by the economist Ross M. Starr.
Probability bounds analysis (PBA) is a collection of methods of uncertainty propagation for making qualitative and quantitative calculations in the face of uncertainties of various kinds. It is used to project partial information about random variables and other quantities through mathematical expressions. For instance, it computes sure bounds on the distribution of a sum, product, or more complex function, given only sure bounds on the distributions of the inputs. Such bounds are called probability boxes, and constrain cumulative probability distributions.
Bayesian hierarchical modelling is a statistical model written in multiple levels that estimates the parameters of the posterior distribution using the Bayesian method. The sub-models combine to form the hierarchical model, and Bayes' theorem is used to integrate them with the observed data and account for all the uncertainty that is present. The result of this integration is the posterior distribution, also known as the updated probability estimate, as additional evidence on the prior distribution is acquired.
In probability theory and statistics, the Dirichlet process (DP) is one of the most popular Bayesian nonparametric models. It was introduced by Thomas Ferguson as a prior over probability distributions.
In mathematics — specifically, in the fields of probability theory and inverse problems — Besov measures and associated Besov-distributed random variables are generalisations of the notions of Gaussian measures and random variables, Laplace distributions, and other classical distributions. They are particularly useful in the study of inverse problems on function spaces for which a Gaussian Bayesian prior is an inappropriate model. The construction of a Besov measure is similar to the construction of a Besov space, hence the nomenclature.
In mathematics and theoretical computer science, analysis of Boolean functions is the study of real-valued functions on or from a spectral perspective. The functions studied are often, but not always, Boolean-valued, making them Boolean functions. The area has found many applications in combinatorics, social choice theory, random graphs, and theoretical computer science, especially in hardness of approximation, property testing, and PAC learning.
Stochastic transitivity models are stochastic versions of the transitivity property of binary relations studied in mathematics. Several models of stochastic transitivity exist and have been used to describe the probabilities involved in experiments of paired comparisons, specifically in scenarios where transitivity is expected, however, empirical observations of the binary relation is probabilistic. For example, players' skills in a sport might be expected to be transitive, i.e. "if player A is better than B and B is better than C, then player A must be better than C"; however, in any given match, a weaker player might still end up winning with a positive probability. Tightly matched players might have a higher chance of observing this inversion while players with large differences in their skills might only see these inversions happen seldom. Stochastic transitivity models formalize such relations between the probabilities and the underlying transitive relation.