Shapley value

Last updated
Lloyd Shapley in 2012 Lloyd Shapley 2 2012.jpg
Lloyd Shapley in 2012

The Shapley value is a solution concept in cooperative game theory. It was named in honor of Lloyd Shapley, who introduced it in 1951 and won the Nobel Memorial Prize in Economic Sciences for it in 2012. [1] [2] To each cooperative game it assigns a unique distribution (among the players) of a total surplus generated by the coalition of all players. The Shapley value is characterized by a collection of desirable properties. Hart (1989) provides a survey of the subject. [3] [4]

Contents

Formal definition

Formally, a coalitional game is defined as: There is a set N (of n players) and a function that maps subsets of players to the real numbers: , with , where denotes the empty set. The function is called a characteristic function.

The function has the following meaning: if S is a coalition of players, then (S), called the worth of coalition S, describes the total expected sum of payoffs the members of can obtain by cooperation.

The Shapley value is one way to distribute the total gains to the players, assuming that they all collaborate. It is a "fair" distribution in the sense that it is the only distribution with certain desirable properties listed below. According to the Shapley value, [5] the amount that player i is given in a coalitional game is

where n is the total number of players and the sum extends over all subsets S of N not containing player i, including the empty set. Also note that is the binomial coefficient. The formula can be interpreted as follows: imagine the coalition being formed one actor at a time, with each actor demanding their contribution as a fair compensation, and then for each actor take the average of this contribution over the possible different permutations in which the coalition can be formed.

An alternative equivalent formula for the Shapley value is:

where the sum ranges over all orders of the players and is the set of players in which precede in the order .

In terms of synergy

From the characteristic function one can compute the synergy that each group of players provides. The synergy is the unique function , such that

for any subset of players. In other words, the 'total value' of the coalition comes from summing up the synergies of each possible subset of .

Given a characteristic function , the synergy function is calculated via

using the Inclusion exclusion principle. In other words, the synergy of coalition is the value , which is not already accounted for by its subsets.

The Shapley values are given in terms of the synergy function by [6] [7]

where the sum is over all subsets of that include player .

This can be interpreted as

In other words, the synergy of each coalition is divided equally between all members.

Examples

Business example

Consider a simplified description of a business. An owner, o, provides crucial capital in the sense that, without him/her, no gains can be obtained. There are m workers w1,...,wm, each of whom contributes an amount p to the total profit. Let

The value function for this coalitional game is

Computing the Shapley value for this coalition game leads to a value of mp/2 for the owner and p/2 for each one of the m workers.

This can be understood from the perspective of synergy. The synergy function is

so the only coalitions that generate synergy are one-to-one between the owner and any individual worker.

Using the above formula for the Shapley value in terms of we compute

and

The result can also be understood from the perspective of averaging over all orders. A given worker joins the coalition after the owner (and therefore contributes p) in half of the orders and thus makes an average contribution of upon joining. When the owner joins, on average half the workers have already joined, so the owner's average contribution upon joining is .

Glove game

The glove game is a coalitional game where the players have left- and right-hand gloves and the goal is to form pairs. Let

where players 1 and 2 have right-hand gloves and player 3 has a left-hand glove.

The value function for this coalitional game is

The formula for calculating the Shapley value is

where R is an ordering of the players and is the set of players in N which precede i in the order R.

The following table displays the marginal contributions of Player 1.

Observe

By a symmetry argument it can be shown that

Due to the efficiency axiom, the sum of all the Shapley values is equal to 1, which means that

Properties

The Shapley value has many desirable properties. Notably, it is the only payment rule satisfying the four properties of Efficiency, Symmetry, Linearity and Null player. [8] See [9] :147–156 for more characterizations of the Shapley value.

Efficiency

The sum of the Shapley values of all agents equals the value of the grand coalition, so that all the gain is distributed among the agents:

Proof:

since is a telescoping sum and there are different orderings .

Symmetry

If and are two actors who are equivalent in the sense that

for every subset of which contains neither nor , then .

This property is also called equal treatment of equals.

Linearity

If two coalition games described by gain functions and are combined, then the distributed gains should correspond to the gains derived from and the gains derived from :

for every in . Also, for any real number ,

for every in .

Null player

The Shapley value of a null player in a game is zero. A player is null in if for all coalitions that do not contain .

Stand-alone test

If is a subadditive set function, i.e., , then for each agent : .

Similarly, if is a superadditive set function, i.e., , then for each agent : .

So, if the cooperation has positive externalities, all agents (weakly) gain, and if it has negative externalities, all agents (weakly) lose. [9] :147–156

Anonymity

If and are two agents, and is a gain function that is identical to except that the roles of and have been exchanged, then . This means that the labeling of the agents doesn't play a role in the assignment of their gains.

Marginalism

The Shapley value can be defined as a function which uses only the marginal contributions of player as the arguments.

Aumann–Shapley value

In their 1974 book, Lloyd Shapley and Robert Aumann extended the concept of the Shapley value to infinite games (defined with respect to a non-atomic measure), creating the diagonal formula. [10] This was later extended by Jean-François Mertens and Abraham Neyman.

As seen above, the value of an n-person game associates with each player the expectation of their contribution to the worth of the coalition of players before them in a random ordering of all the players. When there are many players and each individual plays only a minor role, the set of all players preceding a given one is heuristically thought of as a good sample of all players. The value of a given infinitesimal player ds is then defined as "their" contribution to the worth of a "perfect" sample of all the players.

Symbolically, if v is the coalitional worth function that associates each coalition c with its value, and each coalition c is a measurable subset of the measurable set I of all players, that we assume to be without loss of generality, the value of an infinitesimal player ds in the game is

Here tI is a perfect sample of the all-player set I containing a proportion t of all the players, and is the coalition obtained after ds joins tI. This is the heuristic form of the diagonal formula. [10]

Assuming some regularity of the worth function, for example, assuming v can be represented as differentiable function of a non-atomic measure on I, μ, with density function , with where is the characteristic function of c. Under such conditions

,

as can be shown by approximating the density by a step function and keeping the proportion t for each level of the density function, and

The diagonal formula has then the form developed by Aumann and Shapley (1974)

Above μ can be vector valued (as long as the function is defined and differentiable on the range of μ, the above formula makes sense).

In the argument above if the measure contains atoms is no longer true—this is why the diagonal formula mostly applies to non-atomic games.

Two approaches were deployed to extend this diagonal formula when the function f is no longer differentiable. Mertens goes back to the original formula and takes the derivative after the integral thereby benefiting from the smoothing effect. Neyman took a different approach. Going back to an elementary application of Mertens's approach from Mertens (1980): [11]

This works for example for majority games—while the original diagonal formula cannot be used directly. How Mertens further extends this by identifying symmetries that the Shapley value should be invariant upon, and averaging over such symmetries to create further smoothing effect commuting averages with the derivative operation as above. [12] A survey for non atomic value is found in Neyman (2002) [13]

Generalization to coalitions

The Shapley value only assigns values to the individual agents. It has been generalized [14] to apply to a group of agents C as,

In terms of the synergy function above, this reads [6] [7]

where the sum goes over all subsets of that contain .

This formula suggests the interpretation that the Shapley value of a coalition is to be thought of as the standard Shapley value of a single player, if the coalition is treated like a single player.

Value of a player to another player

The Shapley value was decomposed in [15] into a matrix of values

Each value represents the value of player to player . This matrix satisfies

i.e. the value of player to the whole game is the sum of their value to all individual players.

In terms of the synergy defined above, this reads

where the sum goes over all subsets of that contain and .

This can be interpreted as sum over all subsets that contain players and , where for each subset you

In other words, the synergy value of each coalition is evenly divided among all pairs of players in that coalition, where generates surplus for .

Shapley value regression

Shapley value regression is a statistical method used to measure the contribution of individual predictors in a regression model. In this context, the "players" are the individual predictors or variables in the model, and the "gain" is the total explained variance or predictive power of the model. This method ensures a fair distribution of the total gain among the predictors, attributing each predictor a value representing its contribution to the model's performance. Lipovetsky (2006) discussed the use of Shapley value in regression analysis, providing a comprehensive overview of its theoretical underpinnings and practical applications. [16]

Shapley value contributions are recognized for their balance of stability and discriminating power, which make them suitable for accurately measuring the importance of service attributes in market research. [17] Several studies have applied Shapley value regression to key drivers analysis in marketing research. Pokryshevskaya and Antipov (2012) utilized this method to analyze online customers' repeat purchase intentions, demonstrating its effectiveness in understanding consumer behavior. [18] Similarly, Antipov and Pokryshevskaya (2014) applied Shapley value regression to explain differences in recommendation rates for hotels in South Cyprus, highlighting its utility in the hospitality industry. [19] Further validation of the benefits of Shapley value in key-driver analysis is provided by Vriens, Vidden, and Bosch (2021), who underscored its advantages in applied marketing analytics. [20]

In machine learning

The Shapley value provides a principled way to explain the predictions of nonlinear models common in the field of machine learning. By interpreting a model trained on a set of features as a value function on a coalition of players, Shapley values provide a natural way to compute which features contribute to a prediction [21] or contribute to the uncertainty of a prediction. [22] This unifies several other methods including Locally Interpretable Model-Agnostic Explanations (LIME), [23] DeepLIFT, [24] and Layer-Wise Relevance Propagation. [25] [26]

See also

Related Research Articles

In mathematics, any vector space has a corresponding dual vector space consisting of all linear forms on together with the vector space structure of pointwise addition and scalar multiplication by constants.

Distributions, also known as Schwartz distributions or generalized functions, are objects that generalize the classical notion of functions in mathematical analysis. Distributions make it possible to differentiate functions whose derivatives do not exist in the classical sense. In particular, any locally integrable function has a distributional derivative.

<span class="mw-page-title-main">Noether's theorem</span> Statement relating differentiable symmetries to conserved quantities

Noether's theorem states that every continuous symmetry of the action of a physical system with conservative forces has a corresponding conservation law. This is the first of two theorems published by mathematician Emmy Noether in 1918. The action of a physical system is the integral over time of a Lagrangian function, from which the system's behavior can be determined by the principle of least action. This theorem only applies to continuous and smooth symmetries of physical space.

In mathematics, a linear form is a linear map from a vector space to its field of scalars.

<span class="mw-page-title-main">Jensen's inequality</span> Theorem of convex functions

In mathematics, Jensen's inequality, named after the Danish mathematician Johan Jensen, relates the value of a convex function of an integral to the integral of the convex function. It was proved by Jensen in 1906, building on an earlier proof of the same inequality for doubly-differentiable functions by Otto Hölder in 1889. Given its generality, the inequality appears in many forms depending on the context, some of which are presented below. In its simplest form the inequality states that the convex transformation of a mean is less than or equal to the mean applied after convex transformation.

In the mathematical field of measure theory, an outer measure or exterior measure is a function defined on all subsets of a given set with values in the extended real numbers satisfying some additional technical conditions. The theory of outer measures was first introduced by Constantin Carathéodory to provide an abstract basis for the theory of measurable sets and countably additive measures. Carathéodory's work on outer measures found many applications in measure-theoretic set theory, and was used in an essential way by Hausdorff to define a dimension-like metric invariant now called Hausdorff dimension. Outer measures are commonly used in the field of geometric measure theory.

<span class="mw-page-title-main">Reproducing kernel Hilbert space</span> In functional analysis, a Hilbert space

In functional analysis, a reproducing kernel Hilbert space (RKHS) is a Hilbert space of functions in which point evaluation is a continuous linear functional. Specifically, a Hilbert space of functions from a set is an RKHS if, for each , there exists a function such that for all ,

In game theory, a cooperative game is a game with groups of players who form binding “coalitions” with external enforcement of cooperative behavior. This is different from non-cooperative games in which there is either no possibility to forge alliances or all agreements need to be self-enforcing.

In mathematics, the total variation identifies several slightly different concepts, related to the (local or global) structure of the codomain of a function or a measure. For a real-valued continuous function f, defined on an interval [a, b] ⊂ R, its total variation on the interval of definition is a measure of the one-dimensional arclength of the curve with parametric equation xf(x), for x ∈ [a, b]. Functions whose total variation is finite are called functions of bounded variation.

In mathematics, particularly in functional analysis, a projection-valued measure is a function defined on certain subsets of a fixed set and whose values are self-adjoint projections on a fixed Hilbert space. A projection-valued measure (PVM) is formally similar to a real-valued measure, except that its values are self-adjoint projections rather than real numbers. As in the case of ordinary measures, it is possible to integrate complex-valued functions with respect to a PVM; the result of such an integration is a linear operator on the given Hilbert space.

In statistics, econometrics, and signal processing, an autoregressive (AR) model is a representation of a type of random process; as such, it can be used to describe certain time-varying processes in nature, economics, behavior, etc. The autoregressive model specifies that the output variable depends linearly on its own previous values and on a stochastic term ; thus the model is in the form of a stochastic difference equation which should not be confused with a differential equation. Together with the moving-average (MA) model, it is a special case and key component of the more general autoregressive–moving-average (ARMA) and autoregressive integrated moving average (ARIMA) models of time series, which have a more complicated stochastic structure; it is also a special case of the vector autoregressive model (VAR), which consists of a system of more than one interlocking stochastic difference equation in more than one evolving random variable.

<span class="mw-page-title-main">Characteristic function (probability theory)</span> Fourier transform of the probability density function

In probability theory and statistics, the characteristic function of any real-valued random variable completely defines its probability distribution. If a random variable admits a probability density function, then the characteristic function is the Fourier transform of the probability density function. Thus it provides an alternative route to analytical results compared with working directly with probability density functions or cumulative distribution functions. There are particularly simple results for the characteristic functions of distributions defined by the weighted sums of random variables.

In mathematics, in particular in measure theory, an inner measure is a function on the power set of a given set, with values in the extended real numbers, satisfying some technical conditions. Intuitively, the inner measure of a set is a lower bound of the size of that set.

In mathematics, the spectral theory of ordinary differential equations is the part of spectral theory concerned with the determination of the spectrum and eigenfunction expansion associated with a linear ordinary differential equation. In his dissertation, Hermann Weyl generalized the classical Sturm–Liouville theory on a finite closed interval to second order differential operators with singularities at the endpoints of the interval, possibly semi-infinite or infinite. Unlike the classical case, the spectrum may no longer consist of just a countable set of eigenvalues, but may also contain a continuous part. In this case the eigenfunction expansion involves an integral over the continuous part with respect to a spectral measure, given by the Titchmarsh–Kodaira formula. The theory was put in its final simplified form for singular differential equations of even degree by Kodaira and others, using von Neumann's spectral theorem. It has had important applications in quantum mechanics, operator theory and harmonic analysis on semisimple Lie groups.

In mathematics, the Fortuin–Kasteleyn–Ginibre (FKG) inequality is a correlation inequality, a fundamental tool in statistical mechanics and probabilistic combinatorics, due to Cees M. Fortuin, Pieter W. Kasteleyn, and Jean Ginibre. Informally, it says that in many random systems, increasing events are positively correlated, while an increasing and a decreasing event are negatively correlated. It was obtained by studying the random cluster model.

In mathematics, the ATS theorem is the theorem on the approximation of a trigonometric sum by a shorter one. The application of the ATS theorem in certain problems of mathematical and theoretical physics can be very helpful.

In mathematics, the Pettis integral or Gelfand–Pettis integral, named after Israel M. Gelfand and Billy James Pettis, extends the definition of the Lebesgue integral to vector-valued functions on a measure space, by exploiting duality. The integral was introduced by Gelfand for the case when the measure space is an interval with Lebesgue measure. The integral is also called the weak integral in contrast to the Bochner integral, which is the strong integral.

In mathematics, the method of steepest descent or saddle-point method is an extension of Laplace's method for approximating an integral, where one deforms a contour integral in the complex plane to pass near a stationary point, in roughly the direction of steepest descent or stationary phase. The saddle-point approximation is used with integrals in the complex plane, whereas Laplace’s method is used with real integrals.

Lagrangian field theory is a formalism in classical field theory. It is the field-theoretic analogue of Lagrangian mechanics. Lagrangian mechanics is used to analyze the motion of a system of discrete particles each with a finite number of degrees of freedom. Lagrangian field theory applies to continua and fields, which have an infinite number of degrees of freedom.

In mathematics, calculus on Euclidean space is a generalization of calculus of functions in one or several variables to calculus of functions on Euclidean space as well as a finite-dimensional real vector space. This calculus is also known as advanced calculus, especially in the United States. It is similar to multivariable calculus but is somewhat more sophisticated in that it uses linear algebra more extensively and covers some concepts from differential geometry such as differential forms and Stokes' formula in terms of differential forms. This extensive use of linear algebra also allows a natural generalization of multivariable calculus to calculus on Banach spaces or topological vector spaces.

References

  1. Shapley, Lloyd S. (August 21, 1951). "Notes on the n-Person Game -- II: The Value of an n-Person Game" (PDF). Santa Monica, Calif.: RAND Corporation.
  2. Roth, Alvin E., ed. (1988). The Shapley Value: Essays in Honor of Lloyd S. Shapley. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511528446. ISBN   0-521-36177-X.
  3. Hart, Sergiu (1989). "Shapley Value". In Eatwell, J.; Milgate, M.; Newman, P. (eds.). The New Palgrave: Game Theory. Norton. pp. 210–216. doi:10.1007/978-1-349-20181-5_25. ISBN   978-0-333-49537-7.
  4. Hart, Sergiu (May 12, 2016). "A Bibliography of Cooperative Games: Value Theory".
  5. For a proof of unique existence, see Ichiishi, Tatsuro (1983). Game Theory for Economic Analysis. New York: Academic Press. pp. 118–120. ISBN   0-12-370180-5.
  6. 1 2 Grabisch, Michel (October 1997). "Alternative Representations of Discrete Fuzzy Measures for Decision Making". International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems. 5 (5): 587–607. doi:10.1142/S0218488597000440. ISSN   0218-4885.
  7. 1 2 Grabisch, Michel (1 December 1997). "k-order additive discrete fuzzy measures and their representation". Fuzzy Sets and Systems. 92 (2): 167–189. doi:10.1016/S0165-0114(97)00168-1. ISSN   0165-0114.
  8. Shapley, Lloyd S. (1953). "A Value for n-person Games". In Kuhn, H. W.; Tucker, A. W. (eds.). Contributions to the Theory of Games. Annals of Mathematical Studies. Vol. 28. Princeton University Press. pp. 307–317. doi:10.1515/9781400881970-018. ISBN   9781400881970.
  9. 1 2 Herve Moulin (2004). Fair Division and Collective Welfare. Cambridge, Massachusetts: MIT Press. ISBN   9780262134231.
  10. 1 2 Aumann, Robert J.; Shapley, Lloyd S. (1974). Values of Non-Atomic Games. Princeton: Princeton Univ. Press. ISBN   0-691-08103-4.
  11. Mertens, Jean-François (1980). "Values and Derivatives". Mathematics of Operations Research . 5 (4): 523–552. doi:10.1287/moor.5.4.523. JSTOR   3689325.
  12. Mertens, Jean-François (1988). "The Shapley Value in the Non Differentiable Case". International Journal of Game Theory. 17 (1): 1–65. doi:10.1007/BF01240834. S2CID   118017018.
  13. Neyman, A., 2002. Value of Games with infinitely many Players, "Handbook of Game Theory with Economic Applications," Handbook of Game Theory with Economic Applications, Elsevier, edition 1, volume 3, number 3, 00. R.J. Aumann & S. Hart (ed.).
  14. Grabisch, Michel; Roubens, Marc (1999). "An axiomatic approach to the concept of interaction among players in cooperative games". International Journal of Game Theory. 28 (4): 547–565. doi:10.1007/s001820050125. S2CID   18033890.
  15. Hausken, Kjell; Mohr, Matthias (2001). "The Value of a Player in n-Person Games". Social Choice and Welfare. 18 (3): 465–83. doi:10.1007/s003550000070. JSTOR   41060209. S2CID   27089088.
  16. Lipovetsky S (2006). "Shapley value regression: A method for explaining the contributions of individual predictors to a regression model". Linear Algebra and Its Applications. 417: 48–54. doi:10.1016/j.laa.2006.04.027 (inactive 1 November 2024).{{cite journal}}: CS1 maint: DOI inactive as of November 2024 (link)
  17. Pokryshevskaya E, Antipov E (2014). "A comparison of methods used to measure the importance of service attributes". International Journal of Market Research. 56 (3): 283–296. doi:10.2501/IJMR-2014-023.
  18. Pokryshevskaya EB, Antipov EA (2012). "The strategic analysis of online customers' repeat purchase intentions". Journal of Targeting, Measurement and Analysis for Marketing. 20: 203–211. doi:10.1057/jt.2012.13.
  19. Antipov EA, Pokryshevskaya EB (2014). "Explaining differences in recommendation rates: the case of South Cyprus hotels". Economics Bulletin. 34 (4): 2368–2376.
  20. Vriens M, Vidden C, Bosch N (2021). "The benefits of Shapley-value in key-driver analysis". Applied Marketing Analytics. 6 (3): 269–278.
  21. Lundberg, Scott M.; Lee, Su-In (2017). "A Unified Approach to Interpreting Model Predictions". Advances in Neural Information Processing Systems. 30: 4765–4774. arXiv: 1705.07874 . Retrieved 2021-01-30.
  22. Watson, David; O’Hara, Joshua; Tax, Niek; Mudd, Richard; Guy, Ido (2023). "Explaining Predictive Uncertainty with Information Theoretic Shapley". Advances in Neural Information Processing Systems. 37. arXiv: 2306.05724 .
  23. Ribeiro, Marco Tulio; Singh, Sameer; Guestrin, Carlos (2016-08-13). ""Why Should I Trust You?"". Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York, NY, USA: ACM. pp. 1135–1144. doi:10.1145/2939672.2939778. ISBN   978-1-4503-4232-2.
  24. Shrikumar, Avanti; Greenside, Peyton; Kundaje, Anshul (2017-07-17). "Learning Important Features Through Propagating Activation Differences". PMLR: 3145–3153. ISSN   2640-3498 . Retrieved 2021-01-30.
  25. Bach, Sebastian; Binder, Alexander; Montavon, Grégoire; Klauschen, Frederick; Müller, Klaus-Robert; Samek, Wojciech (2015-07-10). Suarez, Oscar Deniz (ed.). "On Pixel-Wise Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation". PLOS ONE. 10 (7). Public Library of Science (PLoS): e0130140. Bibcode:2015PLoSO..1030140B. doi: 10.1371/journal.pone.0130140 . ISSN   1932-6203. PMC   4498753 . PMID   26161953.
  26. Antipov, E. A.; Pokryshevskaya, E. B. (2020). "Interpretable machine learning for demand modeling with high-dimensional data using Gradient Boosting Machines and Shapley values". Journal of Revenue and Pricing Management. 19 (5): 355–364. doi:10.1057/s41272-020-00236-4.

Further reading