In probability theory, the Kelly criterion (or Kelly strategy or Kelly bet) is a formula for sizing a bet. The Kelly bet size is found by maximizing the expected value of the logarithm of wealth, which is equivalent to maximizing the expected geometric growth rate. Assuming that the expected returns are known, the Kelly criterion leads to higher wealth than any other strategy in the long run (i.e., the theoretical maximum return as the number of bets goes to infinity). J. L. Kelly Jr, a researcher at Bell Labs, described the criterion in 1956. [1]
The practical use of the formula has been demonstrated for gambling, [2] [3] and the same idea was used to explain diversification in investment management. [4] In the 2000s, Kelly-style analysis became a part of mainstream investment theory [5] and the claim has been made that well-known successful investors including Warren Buffett [6] and Bill Gross [7] use Kelly methods. [8] Also see Intertemporal portfolio choice. It is also the standard replacement of statistical power in anytime-valid statistical tests and confidence intervals, based on e-values and e-processes.
In a system where the return on an investment or a bet is binary, so an interested party either wins or loses a fixed percentage of their bet, the expected growth rate coefficient yields a very specific solution for an optimal betting percentage.
Where losing the bet involves losing the entire wager, the Kelly bet is:
where:
As an example, if a gamble has a 60% chance of winning (, ), and the gambler receives 1-to-1 odds on a winning bet (), then to maximize the long-run growth rate of the bankroll, the gambler should bet 20% of the bankroll at each opportunity ().
If the gambler has zero edge (i.e., if ), then the criterion recommends the gambler bet nothing.
If the edge is negative (), the formula gives a negative result, indicating that the gambler should take the other side of the bet. For example, in American roulette, the bettor is offered an even money payoff () on red, when there are 18 red numbers and 20 non-red numbers on the wheel (). The Kelly bet is , meaning the gambler should bet one-nineteenth of their bankroll that red will not come up. There is no explicit anti-red bet offered with comparable odds in roulette, so the best a Kelly gambler can do is bet nothing.
A more general form of the Kelly formula allows for partial losses, which is relevant for investments: [9] : 7
where:
Note that the Kelly criterion is valid only for known outcome probabilities, which is not the case with investments. In addition, risk averse investors should not invest the full Kelly fraction.
The general form can be rewritten as follows
where:
It is clear that, at least, one of the factors or needs to be larger than 1 for having an edge (so ). It is even possible that the win-loss probability ratio is unfavorable , but one has an edge as long as .
The Kelly formula can easily result in a fraction higher than 1, such as with losing size (see the above expression with factors of and ). This happens somewhat counterintuitively, because the Kelly fraction formula compensates for a small losing size with a larger bet. However, in most real situations, there is high uncertainty about all parameters entering the Kelly formula. In the case of a Kelly fraction higher than 1, it is theoretically advantageous to use leverage to purchase additional securities on margin.
In a study, each participant was given $25 and asked to place even-money bets on a coin that would land heads 60% of the time. Participants had 30 minutes to play, so could place about 300 bets, and the prizes were capped at $250. But the behavior of the test subjects was far from optimal:
Remarkably, 28% of the participants went bust, and the average payout was just $91. Only 21% of the participants reached the maximum. 18 of the 61 participants bet everything on one toss, while two-thirds gambled on tails at some stage in the experiment. [10] [11]
Using the Kelly criterion and based on the odds in the experiment (ignoring the cap of $250 and the finite duration of the test), the right approach would be to bet 20% of one's bankroll on each toss of the coin, which works out to a 2.034% average gain each round. This is a geometric mean, not the arithmetic rate of 4% (r = 0.2 x (0.6 - 0.4) = 0.04). The theoretical expected wealth after 300 rounds works out to $10,505 () if it were not capped.
In this particular game, because of the cap, a strategy of betting only 12% of the pot on each toss would have even better results (a 95% probability of reaching the cap and an average payout of $242.03).
Heuristic proofs of the Kelly criterion are straightforward. [12] The Kelly criterion maximizes the expected value of the logarithm of wealth (the expectation value of a function is given by the sum, over all possible outcomes, of the probability of each particular outcome multiplied by the value of the function in the event of that outcome). We start with 1 unit of wealth and bet a fraction of that wealth on an outcome that occurs with probability and offers odds of . The probability of winning is , and in that case the resulting wealth is equal to . The probability of losing is and the odds of a negative outcome is . In that case the resulting wealth is equal to . Therefore, the expected geometric growth rate is:
We want to find the maximum r of this curve (as a function of f), which involves finding the derivative of the equation. This is more easily accomplished by taking the logarithm of each side first. The resulting equation is:
with denoting logarithmic wealth growth. To find the value of for which the growth rate is maximized, denoted as , we differentiate the above expression and set this equal to zero. This gives:
Rearranging this equation to solve for the value of gives the Kelly criterion:
Notice that this expression reduces to the simple gambling formula when , when a loss results in full loss of the wager.
If the return rates on an investment or a bet are continuous in nature the optimal growth rate coefficient must take all possible events into account.
In mathematical finance, if security weights maximize the expected geometric growth rate (which is equivalent to maximizing log wealth), then a portfolio is growth optimal.
The Kelly Criterion shows that for a given volatile security this is satisfied when
where is the fraction of available capital invested that maximizes the expected geometric growth rate, is the expected growth rate coefficient, is the variance of the growth rate coefficient and is the rate of return on the remaining capital. Note that a symmetric probability density function was assumed here.
Computations of growth optimal portfolios can suffer tremendous garbage in, garbage out problems. For example, the cases below take as given the expected return and covariance structure of assets, but these parameters are at best estimates or models that have significant uncertainty. If portfolio weights are largely a function of estimation errors, then Ex-post performance of a growth-optimal portfolio may differ fantastically from the ex-ante prediction. Parameter uncertainty and estimation errors are a large topic in portfolio theory. An approach to counteract the unknown risk is to invest less than the Kelly criterion.
Rough estimates are still useful. If we take excess return 4% and volatility 16%, then yearly Sharpe ratio and Kelly ratio are calculated to be 25% and 150%. Daily Sharpe ratio and Kelly ratio are 1.7% and 150%. Sharpe ratio implies daily win probability of p=(50% + 1.7%/4), where we assumed that probability bandwidth is . Now we can apply discrete Kelly formula for above with , and we get another rough estimate for Kelly fraction . Both of these estimates of Kelly fraction appear quite reasonable, yet a prudent approach suggest a further multiplication of Kelly ratio by 50% (i.e. half-Kelly).
A detailed paper by Edward O. Thorp and a co-author estimates Kelly fraction to be 117% for the American stock market SP500 index. [13] Significant downside tail-risk for equity markets is another reason [14] to reduce Kelly fraction from naive estimate (for instance, to reduce to half-Kelly).
A rigorous and general proof can be found in Kelly's original paper [1] or in some of the other references listed below. Some corrections have been published. [15] We give the following non-rigorous argument for the case with (a 50:50 "even money" bet) to show the general idea and provide some insights. [1] When , a Kelly bettor bets times their initial wealth , as shown above. If they win, they have after one bet. If they lose, they have . Suppose they make bets like this, and win times out of this series of bets. The resulting wealth will be:
The ordering of the wins and losses does not affect the resulting wealth. Suppose another bettor bets a different amount, for some value of (where may be positive or negative). They will have after a win and after a loss. After the same series of wins and losses as the Kelly bettor, they will have:
Take the derivative of this with respect to and get:
The function is maximized when this derivative is equal to zero, which occurs at:
which implies that
but the proportion of winning bets will eventually converge to:
according to the weak law of large numbers. So in the long run, final wealth is maximized by setting to zero, which means following the Kelly strategy. This illustrates that Kelly has both a deterministic and a stochastic component. If one knows K and N and wishes to pick a constant fraction of wealth to bet each time (otherwise one could cheat and, for example, bet zero after the Kth win knowing that the rest of the bets will lose), one will end up with the most money if one bets:
each time. This is true whether is small or large. The "long run" part of Kelly is necessary because K is not known in advance, just that as gets large, will approach . Someone who bets more than Kelly can do better if for a stretch; someone who bets less than Kelly can do better if for a stretch, but in the long run, Kelly always wins. The heuristic proof for the general case proceeds as follows.[ citation needed ] In a single trial, if one invests the fraction of their capital, if the strategy succeeds, the capital at the end of the trial increases by the factor , and, likewise, if the strategy fails, the capital is decreased by the factor . Thus at the end of trials (with successes and failures), the starting capital of $1 yields
Maximizing , and consequently , with respect to leads to the desired result
Edward O. Thorp provided a more detailed discussion of this formula for the general case. [9] There, it can be seen that the substitution of for the ratio of the number of "successes" to the number of trials implies that the number of trials must be very large, since is defined as the limit of this ratio as the number of trials goes to infinity. In brief, betting each time will likely maximize the wealth growth rate only in the case where the number of trials is very large, and and are the same for each trial. In practice, this is a matter of playing the same game over and over, where the probability of winning and the payoff odds are always the same. In the heuristic proof above, successes and failures are highly likely only for very large .
Kelly's criterion may be generalized [16] on gambling on many mutually exclusive outcomes, such as in horse races. Suppose there are several mutually exclusive outcomes. The probability that the -th horse wins the race is , the total amount of bets placed on -th horse is , and
where are the pay-off odds. , is the dividend rate where is the track take or tax, is the revenue rate after deduction of the track take when -th horse wins. The fraction of the bettor's funds to bet on -th horse is . Kelly's criterion for gambling with multiple mutually exclusive outcomes gives an algorithm for finding the optimal set of outcomes on which it is reasonable to bet and it gives explicit formula for finding the optimal fractions of bettor's wealth to be bet on the outcomes included in the optimal set . The algorithm for the optimal set of outcomes consists of four steps: [16]
If the optimal set is empty then do not bet at all. If the set of optimal outcomes is not empty, then the optimal fraction to bet on -th outcome may be calculated from this formula:
One may prove [16] that
where the right hand-side is the reserve rate[ clarification needed ]. Therefore, the requirement may be interpreted [16] as follows: -th outcome is included in the set of optimal outcomes if and only if its expected revenue rate is greater than the reserve rate. The formula for the optimal fraction may be interpreted as the excess of the expected revenue rate of -th horse over the reserve rate divided by the revenue after deduction of the track take when -th horse wins or as the excess of the probability of -th horse winning over the reserve rate divided by revenue after deduction of the track take when -th horse wins. The binary growth exponent is
and the doubling time is
This method of selection of optimal bets may be applied also when probabilities are known only for several most promising outcomes, while the remaining outcomes have no chance to win. In this case it must be that
The second-order Taylor polynomial can be used as a good approximation of the main criterion. Primarily, it is useful for stock investment, where the fraction devoted to investment is based on simple characteristics that can be easily estimated from existing historical data – expected value and variance. This approximation leads to results that are robust and offer similar results as the original criterion. [17]
For single assets(stock, index fund, etc.), and a risk-free rate, it is easy to obtain the optimal fraction to invest through geometric Brownian motion. The stochastic differential equation governing the evolution of a lognormally distributed asset at time () is
whose solution is
where is a Wiener process, and (percentage drift) and (the percentage volatility) are constants. Taking expectations of the logarithm:
Then the expected log return is
Consider a portfolio made of an asset and a bond paying risk-free rate , with fraction invested in and in the bond. The aforementioned equation for must be modified by this fraction, ie , with associated solution
the expected one-period return is given by
For small , , and , the solution can be expanded to first order to yield an approximate increase in wealth
Solving we obtain
is the fraction that maximizes the expected logarithmic return, and so, is the Kelly fraction. Thorp [9] arrived at the same result but through a different derivation. Remember that is different from the asset log return . Confusing this is a common mistake made by websites and articles talking about the Kelly Criterion.
For multiple assets, consider a market with correlated stocks with stochastic returns , and a riskless bond with return . An investor puts a fraction of their capital in and the rest is invested in the bond. Without loss of generality, assume that investor's starting capital is equal to 1. According to the Kelly criterion one should maximize
Expanding this with a Taylor series around we obtain
Thus we reduce the optimization problem to quadratic programming and the unconstrained solution is
where and are the vector of means and the matrix of second mixed noncentral moments of the excess returns. There is also a numerical algorithm for the fractional Kelly strategies and for the optimal solution under no leverage and no short selling constraints. [18]
In a 1738 article, Daniel Bernoulli suggested that, when one has a choice of bets or investments, one should choose that with the highest geometric mean of outcomes. This is mathematically equivalent to the Kelly criterion, although the motivation is different (Bernoulli wanted to resolve the St. Petersburg paradox).
An English translation of the Bernoulli article was not published until 1954, [19] but the work was well known among mathematicians and economists.
Although the Kelly strategy's promise of doing better than any other strategy in the long run seems compelling, some economists have argued strenuously against it, mainly because an individual's specific investing constraints may override the desire for optimal growth rate. [8] The conventional alternative is expected utility theory which says bets should be sized to maximize the expected utility of the outcome (to an individual with logarithmic utility, the Kelly bet maximizes expected utility, so there is no conflict; moreover, Kelly's original paper clearly states the need for a utility function in the case of gambling games which are played finitely many times [1] ). Even Kelly supporters usually argue for fractional Kelly (betting a fixed fraction of the amount recommended by Kelly) for a variety of practical reasons, such as wishing to reduce volatility, or protecting against non-deterministic errors in their advantage (edge) calculations. [20] In colloquial terms, the Kelly criterion requires accurate probability values, which isn't always possible for real-world event outcomes. When a gambler overestimates their true probability of winning, the criterion value calculated will diverge from the optimal, increasing the risk of ruin.
Kelly formula can be thought as 'time diversification', which is taking equal risk during different sequential time periods (as opposed to taking equal risk in different assets for asset diversification). There is clearly a difference between time diversification and asset diversification, which was raised [21] by Paul A. Samuelson. There is also a difference between ensemble-averaging (utility calculation) and time-averaging (Kelly multi-period betting over a single time path in real life). The debate was renewed by envoking ergodicity breaking. [22] Yet the difference between ergodicity breaking and Knightian uncertainty should be recognized. [23]
In information theory, the entropy of a random variable is the average level of "information", "surprise", or "uncertainty" inherent to the variable's possible outcomes. Given a discrete random variable , which takes values in the alphabet and is distributed according to , the entropy is
In statistics, a normal distribution or Gaussian distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is
In game theory, the Nash equilibrium, named after the mathematician John Nash, is the most common way to define the solution of a non-cooperative game involving two or more players. In a Nash equilibrium, each player is assumed to know the equilibrium strategies of the other players, and no one has anything to gain by changing only one's own strategy. The principle of Nash equilibrium dates back to the time of Cournot, who in 1838 applied it to competing firms choosing outputs.
The Black–Scholes or Black–Scholes–Merton model is a mathematical model for the dynamics of a financial market containing derivative investment instruments. From the parabolic partial differential equation in the model, known as the Black–Scholes equation, one can deduce the Black–Scholes formula, which gives a theoretical estimate of the price of European-style options and shows that the option has a unique price given the risk of the security and its expected return. The equation and model are named after economists Fischer Black and Myron Scholes. Robert C. Merton, who first wrote an academic paper on the subject, is sometimes also credited.
In statistics, maximum likelihood estimation (MLE) is a method of estimating the parameters of an assumed probability distribution, given some observed data. This is achieved by maximizing a likelihood function so that, under the assumed statistical model, the observed data is most probable. The point in the parameter space that maximizes the likelihood function is called the maximum likelihood estimate. The logic of maximum likelihood is both intuitive and flexible, and as such the method has become a dominant means of statistical inference.
Quantization, in mathematics and digital signal processing, is the process of mapping input values from a large set to output values in a (countable) smaller set, often with a finite number of elements. Rounding and truncation are typical examples of quantization processes. Quantization is involved to some degree in nearly all digital signal processing, as the process of representing a signal in digital form ordinarily involves rounding. Quantization also forms the core of essentially all lossy compression algorithms.
In probability theory, a Chernoff bound is an exponentially decreasing upper bound on the tail of a random variable based on its moment generating function. The minimum of all such exponential bounds forms the Chernoff or Chernoff-Cramér bound, which may decay faster than exponential. It is especially useful for sums of independent random variables, such as sums of Bernoulli random variables.
Amplitude-shift keying (ASK) is a form of amplitude modulation that represents digital data as variations in the amplitude of a carrier wave. In an ASK system, a symbol, representing one or more bits, is sent by transmitting a fixed-amplitude carrier wave at a fixed frequency for a specific time duration. For example, if each symbol represents a single bit, then the carrier signal could be transmitted at nominal amplitude when the input value is 1, but transmitted at reduced amplitude or not at all when the input value is 0.
The Lawson criterion is a figure of merit used in nuclear fusion research. It compares the rate of energy being generated by fusion reactions within the fusion fuel to the rate of energy losses to the environment. When the rate of production is higher than the rate of loss, the system will produce net energy. If enough of that energy is captured by the fuel, the system will become self-sustaining and is said to be ignited.
In mathematical statistics, the Kullback–Leibler (KL) divergence, denoted , is a type of statistical distance: a measure of how one probability distribution P is different from a second, reference probability distribution Q. A simple interpretation of the KL divergence of P from Q is the expected excess surprise from using Q as a model when the actual distribution is P. While it is a measure of how different two distributions are, and in some sense is thus a "distance", it is not actually a metric, which is the most familiar and formal type of distance. In particular, it is not symmetric in the two distributions, and does not satisfy the triangle inequality. Instead, in terms of information geometry, it is a type of divergence, a generalization of squared distance, and for certain classes of distributions, it satisfies a generalized Pythagorean theorem.
Statistical inference might be thought of as gambling theory applied to the world around us. The myriad applications for logarithmic information measures tell us precisely how to take the best guess in the face of partial information. In that sense, information theory might be considered a formal expression of the theory of gambling. It is no surprise, therefore, that information theory has applications to games of chance.
In mathematics, the theory of optimal stopping or early stopping is concerned with the problem of choosing a time to take a particular action, in order to maximise an expected reward or minimise an expected cost. Optimal stopping problems can be found in areas of statistics, economics, and mathematical finance. A key example of an optimal stopping problem is the secretary problem. Optimal stopping problems can often be written in the form of a Bellman equation, and are therefore often solved using dynamic programming.
In coding theory, list decoding is an alternative to unique decoding of error-correcting codes for large error rates. The notion was proposed by Elias in the 1950s. The main idea behind list decoding is that the decoding algorithm instead of outputting a single possible message outputs a list of possibilities one of which is correct. This allows for handling a greater number of errors than that allowed by unique decoding.
In decision theory, the odds algorithm is a mathematical method for computing optimal strategies for a class of problems that belong to the domain of optimal stopping problems. Their solution follows from the odds strategy, and the importance of the odds strategy lies in its optimality, as explained below.
In probability theory, Proebsting's paradox is an argument that appears to show that the Kelly criterion can lead to ruin. Although it can be resolved mathematically, it raises some interesting issues about the practical application of Kelly, especially in investing. It was named and first discussed by Edward O. Thorp in 2008. The paradox was named for Todd Proebsting, its creator.
A Moran process or Moran model is a simple stochastic process used in biology to describe finite populations. The process is named after Patrick Moran, who first proposed the model in 1958. It can be used to model variety-increasing processes such as mutation as well as variety-reducing effects such as genetic drift and natural selection. The process can describe the probabilistic dynamics in a finite population of constant size N in which two alleles A and B are competing for dominance. The two alleles are considered to be true replicators.
Isoline retrieval is a remote sensing inverse method that retrieves one or more isolines of a trace atmospheric constituent or variable. When used to validate another contour, it is the most accurate method possible for the task. When used to retrieve a whole field, it is a general, nonlinear inverse method and a robust estimator.
In quantum mechanics, and especially quantum information and the study of open quantum systems, the trace distanceT is a metric on the space of density matrices and gives a measure of the distinguishability between two states. It is the quantum generalization of the Kolmogorov distance for classical probability distributions.
A product distribution is a probability distribution constructed as the distribution of the product of random variables having two other known distributions. Given two statistically independent random variables X and Y, the distribution of the random variable Z that is formed as the product is a product distribution.
In probability theory, a subgaussian distribution, the distribution of a subgaussian random variable, is a probability distribution with strong tail decay. More specifically, the tails of a subgaussian distribution are dominated by the tails of a Gaussian. This property gives subgaussian distributions their name.