Will Rogers phenomenon

Last updated

The Will Rogers phenomenon, also rarely called the Okie paradox, is when moving an observation from one group to another increases the average of both groups. It is named after a joke by the comedian Will Rogers in the 1930s about migration during the Great Depression: [1]

Contents

When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states.

All Rogers was attempting to say was that, in his view, a group of Okies whose mean intelligence is lower than the average of all Okies may be smarter than the average Californian.

The apparent paradox comes from the rise in intelligence of both groups, which makes it seem as though intelligence has been "created." However, the overall population maintains the same average intelligence: moving a person from a low-intelligence group to a high-intelligence group makes the high-intelligence group larger, so the population mean (which is a weighted average of the two groups' intelligence) is unaffected.

Numerical examples

Consider the following sets R and S, whose arithmetic mean are 2.5 and 7, respectively.

If 5 is moved from S to R, then the arithmetic mean of R increases to 3, and the arithmetic mean of S increases to 7.5, even though the total set of numbers themselves, and therefore their overall average, have not changed.

Consider this more dramatic example, where the arithmetic means of sets R and S, are 1.5 and 10033, respectively:

If 99 is moved from S to R, then the arithmetic means increase to 34 and 15000. The number 99 is orders of magnitude above 1 and 2, and orders of magnitude below 10000 and 20000.

The element which is moved does not have to be the very lowest or highest of its set; it merely has to have a value that lies between the means of the two sets. And the sets themselves can have overlapping ranges. Consider this example:

If 10, which is larger than R's mean of 7 and smaller than S's mean of 12, is moved from S to R, the arithmetic means still increase slightly, to 7.375 and 12.333.

Stage migration

One real-world example of the phenomenon is seen in the medical concept of cancer stage migration, [2] which led clinician Alvan Feinstein to coin the term Will Rogers phenomenon in 1985, based on a remark by a friend who attributed the quote to Rogers. [3]

In medical stage migration, improved detection of illness leads to the movement of people from the set of healthy people to the set of unhealthy people. Because these people are actually not healthy — merely misclassified as healthy due to an imperfect earlier diagnosis — removing them from the set of healthy people increases the average lifespan of the healthy group. Likewise, the migrated people are healthier than the people already in the unhealthy set: their illness was so minor that only the newer more sensitive test could detect it. Adding them to the unhealthy raises the average lifespan of that group as well. Both lifespans are statistically lengthened, even if early detection of a cancer does not lead to better treatment: because it is detected earlier, more time is lived in the "unhealthy" set of people. In this form, the paradox can be viewed an instance of the equivocation fallacy.[ citation needed ] Equivocation occurs when one term is used with multiple meanings in order to mislead the listener into unwarranted comparisons, and life span statistics before and after a stage migration use different meanings of "unhealthy", as the cutoff for detection is different.

See also

Related Research Articles

In mathematics and statistics, the arithmetic mean, arithmetic average, or just the mean or average is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results from an experiment, an observational study, or a survey. The term "arithmetic mean" is preferred in some mathematics and statistics contexts because it helps distinguish it from other types of means, such as geometric and harmonic.

In computability theory, the Ackermann function, named after Wilhelm Ackermann, is one of the simplest and earliest-discovered examples of a total computable function that is not primitive recursive. All primitive recursive functions are total and computable, but the Ackermann function illustrates that not all total computable functions are primitive recursive.

The Flynn effect is the substantial and long-sustained increase in both fluid and crystallized intelligence test scores that were measured in many parts of the world over the 20th century, named after researcher James Flynn (1934–2020). When intelligence quotient (IQ) tests are initially standardized using a sample of test-takers, by convention the average of the test results is set to 100 and their standard deviation is set to 15 or 16 IQ points. When IQ tests are revised, they are again standardized using a new sample of test-takers, usually born more recently than the first; the average result is set to 100. When the new test subjects take the older tests, in almost every case their average scores are significantly above 100.

<span class="mw-page-title-main">Geometric mean</span> N-th root of the product of n numbers

In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite set of real numbers by using the product of their values. The geometric mean is defined as the nth root of the product of n numbers, i.e., for a set of numbers a1, a2, ..., an, the geometric mean is defined as

A mean is a numeric quantity representing the center of a collection of numbers and is intermediate to the extreme values of a set of numbers. There are several kinds of means in mathematics, especially in statistics. Each attempts to summarize or typify a given group of data, illustrating the magnitude and sign of the data set. Which of these measures is most illuminating depends on what is being measured, and on context and purpose.

In computability theory, a primitive recursive function is, roughly speaking, a function that can be computed by a computer program whose loops are all "for" loops. Primitive recursive functions form a strict subset of those general recursive functions that are also total functions.

<span class="mw-page-title-main">Standard deviation</span> In statistics, a measure of variation

In statistics, the standard deviation is a measure of the amount of variation of a random variable expected about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not.

<span class="mw-page-title-main">Simpson's paradox</span> Error in statistical reasoning with groups

Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling.

In ordinary language, an average is a single number or value that best represents a set of data. The type of average taken as most typically representative of a list of numbers is the arithmetic mean – the sum of the numbers divided by how many numbers are in the list. For example, the mean average of the numbers 2, 3, 4, 7, and 9 is 5. Depending on the context, the most representative statistic to be taken as the average might be another measure of central tendency, such as the mid-range, median, mode or geometric mean. For example, the average personal income is often given as the median – the number below which are 50% of personal incomes and personal incomes from a few billionaires. For this reason, it is recommended to avoid using the word "average" when discussing measures of central tendency and specify which type of measure of average is being used.

In mathematics, the root mean square of a set of numbers is the square root of the set's mean square. Given a set , its RMS is denoted as either or . The RMS is also known as the quadratic mean, a special case of the generalized mean. The RMS of a continuous function is denoted and can be defined in terms of an integral of the square of the function.

In mathematics, the distributive property of binary operations is a generalization of the distributive law, which asserts that the equality

In medicine and health-related fields, a reference range or reference interval is the range or the interval of values that is deemed normal for a physiological measurement in healthy persons. It is a basis for comparison for a physician or other health professional to interpret a set of test results for a particular patient. Some important reference ranges in medicine are reference ranges for blood tests and reference ranges for urine tests.

A return period, also known as a recurrence interval or repeat interval, is an average time or an estimated average time between events such as earthquakes, floods, landslides, or river discharge flows to occur.

In mathematics, 0.999... is a notation for the repeating decimal consisting of an unending sequence of 9s after the decimal point. This repeating decimal is a numeral that represents the smallest number no less than every number in the sequence ; that is, the supremum of this sequence. This number is equal to 1. In other words, "0.999..." is not "almost exactly 1" or "very, very nearly but not quite 1"; rather, "0.999..." and "1" represent exactly the same number.

In mathematical logic and automated theorem proving, resolution is a rule of inference leading to a refutation-complete theorem-proving technique for sentences in propositional logic and first-order logic. For propositional logic, systematically applying the resolution rule acts as a decision procedure for formula unsatisfiability, solving the Boolean satisfiability problem. For first-order logic, resolution can be used as the basis for a semi-algorithm for the unsatisfiability problem of first-order logic, providing a more practical method than one following from Gödel's completeness theorem.

In the theory of computation, the Sudan function is an example of a function that is recursive, but not primitive recursive. This is also true of the better-known Ackermann function.

<span class="mw-page-title-main">Hispanic paradox</span> Epidemiological finding

The Hispanic paradox is an epidemiological finding that Hispanic Americans tend to have health outcomes that "paradoxically" are comparable to, or in some cases better than, those of their U.S. non-Hispanic White counterparts, even though Hispanics have lower average income and education, higher rates of disability, as well as a higher incidence of various cardiovascular risk factors and metabolic diseases.

<span class="mw-page-title-main">Complex-base system</span> Positional numeral system

In arithmetic, a complex-base system is a positional numeral system whose radix is an imaginary or complex number.

In mathematics, the hyperoperation sequence is an infinite sequence of arithmetic operations (called hyperoperations in this context) that starts with a unary operation (the successor function with n = 0). The sequence continues with the binary operations of addition (n = 1), multiplication (n = 2), and exponentiation (n = 3).

Type theory with records is a formal semantics representation framework, using records to express type theory types. It has been used in natural language processing, principally computational semantics and dialogue systems.

References

  1. Feinstein AR, Sosin DM, Wells CK (June 1985). "The Will Rogers phenomenon. Stage migration and new diagnostic techniques as a source of misleading statistics for survival in cancer". The New England Journal of Medicine. 312 (25): 1604–8. doi:10.1056/NEJM198506203122504. PMID   4000199.
  2. Sormani, M. P.; Tintorè, M.; Rovaris, M.; Rovira, A.; Vidal, X.; Bruzzi, P.; Filippi, M.; Montalban, X. (2008). "Will Rogers phenomenon in multiple sclerosis". Annals of Neurology. 64 (4): 428–433. doi: 10.1002/ana.21464 . PMID   18688811. S2CID   25960476.
  3. Singer, Richard B. (1994). Medical Risks: 1991 Compend of Mortality and Morbidity. Greenwood Publishing Group. p. 76. ISBN   978-0-275-94553-4.