Equalized odds

Last updated

Equalized odds, [1] also referred to as conditional procedure accuracy equality and disparate mistreatment, is a measure of fairness in machine learning. A classifier satisfies this definition if the subjects in the protected and unprotected groups have equal true positive rate and equal false positive rate, [2] satisfying the formula:

For example, could be gender, race, or any other characteristics that we want to be free of bias, while would be whether the person is qualified for the degree, and the output would be the school's decision whether to offer the person to study for the degree. In this context, higher university enrollment rates of African Americans compared to whites with similar test scores might be necessary to fulfill the condition of equalized odds, if the "base rate" of differs between the groups.

The concept was originally defined for binary-valued . In 2017, Woodworth et al. generalized the concept further for multiple classes. [3]

See also

Related Research Articles

<span class="mw-page-title-main">Geometric series</span> Sum of an (infinite) geometric progression

In mathematics, a geometric series is the sum of an infinite number of terms that have a constant ratio between successive terms. For example, the series

<span class="mw-page-title-main">Isomorphism</span> In mathematics, invertible homomorphism

In mathematics, an isomorphism is a structure-preserving mapping between two structures of the same type that can be reversed by an inverse mapping. Two mathematical structures are isomorphic if an isomorphism exists between them. The word isomorphism is derived from the Ancient Greek: ἴσοςisos "equal", and μορφήmorphe "form" or "shape".

<span class="mw-page-title-main">Random variable</span> Variable representing a random phenomenon

A random variable is a mathematical formalization of a quantity or object which depends on random events. The term 'random variable' in its mathematical definition refers to neither randomness nor variability but instead is a mathematical function in which

<span class="mw-page-title-main">Negative binomial distribution</span> Probability distribution

In probability theory and statistics, the negative binomial distribution is a discrete probability distribution that models the number of failures in a sequence of independent and identically distributed Bernoulli trials before a specified (non-random) number of successes occurs. For example, we can define rolling a 6 on some dice as a success, and rolling any other number as a failure, and ask how many failure rolls will occur before we see the third success. In such a case, the probability distribution of the number of failures that appear will be a negative binomial distribution.

In probability theory and statistics, Bayes' theorem, named after Thomas Bayes, describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if the risk of developing health problems is known to increase with age, Bayes' theorem allows the risk to an individual of a known age to be assessed more accurately by conditioning it relative to their age, rather than assuming that the individual is typical of the population as a whole.

<span class="mw-page-title-main">Triangle inequality</span> Property of geometry, also used to generalize the notion of "distance" in metric spaces

In mathematics, the triangle inequality states that for any triangle, the sum of the lengths of any two sides must be greater than or equal to the length of the remaining side. This statement permits the inclusion of degenerate triangles, but some authors, especially those writing about elementary geometry, will exclude this possibility, thus leaving out the possibility of equality. If x, y, and z are the lengths of the sides of the triangle, with no side being greater than z, then the triangle inequality states that

<span class="mw-page-title-main">Ratio</span> Relationship between two numbers of the same kind

In mathematics, a ratio shows how many times one number contains another. For example, if there are eight oranges and six lemons in a bowl of fruit, then the ratio of oranges to lemons is eight to six. Similarly, the ratio of lemons to oranges is 6:8 and the ratio of oranges to the total amount of fruit is 8:14.

<span class="mw-page-title-main">Correlation</span> Statistical concept

In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the so-called demand curve.

<span class="mw-page-title-main">Erlang distribution</span> Family of continuous probability distributions

The Erlang distribution is a two-parameter family of continuous probability distributions with support . The two parameters are:

<span class="mw-page-title-main">Multivariate analysis of variance</span> Procedure for comparing multivariate sample means

In statistics, multivariate analysis of variance (MANOVA) is a procedure for comparing multivariate sample means. As a multivariate procedure, it is used when there are two or more dependent variables, and is often followed by significance tests involving individual dependent variables separately.

The Beal conjecture is the following conjecture in number theory:

Disparate impact in the law of the United States refers to practices in employment, housing, and other areas that adversely affect one group of people of a protected characteristic more than another, even though rules applied by employers or landlords are formally neutral. Although the protected classes vary by statute, most federal civil rights laws consider race, color, religion, national origin, and sex to be protected characteristics, and some laws include disability status and other traits as well.

<span class="mw-page-title-main">F-score</span> Statistical measure of a tests accuracy

In statistical analysis of binary classification and information retrieval systems, the F-score or F-measure is a measure of predictive performance. It is calculated from the precision and recall of the test, where the precision is the number of true positive results divided by the number of all samples predicted to be positive, including those not identified correctly, and the recall is the number of true positive results divided by the number of all samples that should have been identified as positive. Precision is also known as positive predictive value, and recall is also known as sensitivity in diagnostic binary classification.

A logical matrix, binary matrix, relation matrix, Boolean matrix, or (0, 1)-matrix is a matrix with entries from the Boolean domain B = {0, 1}. Such a matrix can be used to represent a binary relation between a pair of finite sets. It is an important tool in combinatorial mathematics and theoretical computer science.

In mathematics, low-rank approximation is a minimization problem, in which the cost function measures the fit between a given matrix and an approximating matrix, subject to a constraint that the approximating matrix has reduced rank. The problem is used for mathematical modeling and data compression. The rank constraint is related to a constraint on the complexity of a model that fits the data. In applications, often there are other constraints on the approximating matrix apart from the rank constraint, e.g., non-negativity and Hankel structure.

It was customary to represent black hole horizons via stationary solutions of field equations, i.e., solutions which admit a time-translational Killing vector field everywhere, not just in a small neighborhood of the black hole. While this simple idealization was natural as a starting point, it is overly restrictive. Physically, it should be sufficient to impose boundary conditions at the horizon which ensure only that the black hole itself is isolated. That is, it should suffice to demand only that the intrinsic geometry of the horizon be time independent, whereas the geometry outside may be dynamical and admit gravitational and other radiation.

The sample complexity of a machine learning algorithm represents the number of training-samples that it needs in order to successfully learn a target function.

<span class="mw-page-title-main">Generative adversarial network</span> Deep learning method

A generative adversarial network (GAN) is a class of machine learning frameworks and a prominent framework for approaching generative AI. The concept was initially developed by Ian Goodfellow and his colleagues in June 2014. In a GAN, two neural networks contest with each other in the form of a zero-sum game, where one agent's gain is another agent's loss.

Combinatorial participatory budgeting,also called indivisible participatory budgeting or budgeted social choice, is a problem in social choice. There are several candidate projects, each of which has a fixed costs. There is a fixed budget, that cannot cover all these projects. Each voter has different preferences regarding these projects. The goal is to find a budget-allocation - a subset of the projects, with total cost at most the budget, that will be funded. Combinatorial participatory budgeting is the most common form of participatory budgeting.

Fairness in machine learning refers to the various attempts at correcting algorithmic bias in automated decision processes based on machine learning models. Decisions made by computers after a machine-learning process may be considered unfair if they were based on variables considered sensitive. For example gender, ethnicity, sexual orientation or disability. As it is the case with many ethical concepts, definitions of fairness and bias are always controversial. In general, fairness and bias are considered relevant when the decision process impacts people's lives. In machine learning, the problem of algorithmic bias is well known and well studied. Outcomes may be skewed by a range of factors and thus might be considered unfair with respect to certain groups or individuals. An example would be the way social media sites deliver personalized news to consumers.

References

  1. Hardt, Moritz; Price, Eric; Srebro, Nathan (2016). "Equality of Opportunity in Supervised Learning". Neural Information Processing Systems. 29. arXiv: 1610.02413 .
  2. "Fairness in ML 2: Equal opportunity and odds" (PDF). www2.cs.duke.edu/. Duke Computer Science.
  3. Woodworth, Blake; Gunasekar, Suriya; Ohannessian, Mesrob I.; Srebro, Nathan (2017). "Learning Non-Discriminatory Predictors". Proceedings of the 2017 Conference on Learning Theory: 1920–1953. arXiv: 1702.06081 .