Will Rogers phenomenon

Last updated

The Will Rogers phenomenon, also rarely called the Okie paradox, [1] is when moving an observation from one group to another increases the average of both groups. It is named after a joke attributed to the comedian Will Rogers about Dust Bowl migration during the Great Depression: [2]

Contents

When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states.

The joke's premise is that Okies (Dust Bowl migrants from Oklahoma) were not as bright as the average Oklahoman, but smarter than the average Californian. This quotation was first attributed to native Oklahoman Rogers decades after his death; versions of the same joke with different places and people circulated at least 15 years before it can be linked to Okies. [3]

The apparent paradox comes from the rise in intelligence of both groups, which makes it seem as though intelligence has been "created." However, the overall population maintains the same average intelligence: moving a person of moderate intelligence out of a high-intelligence group and into a low-intelligence group increases the mean intelligence of both the low- and the high-intelligence groups, while the overall population mean (which is a weighted average of the two groups' intelligence) is unaffected.

Numerical examples

Consider the following sets R and S, whose arithmetic mean are 2.5 and 7, respectively.

If 5 is moved from S to R, then the arithmetic mean of R increases to 3, and the arithmetic mean of S increases to 7.5, even though the total set of numbers themselves, and therefore their overall average, have not changed.

Consider this more dramatic example, where the arithmetic means of sets R and S, are 1.5 and 10033, respectively:

If 99 is moved from S to R, then the arithmetic means increase to 34 and 15000. The number 99 is orders of magnitude above 1 and 2, and orders of magnitude below 10000 and 20000.

The element which is moved does not have to be the very lowest or highest of its set; it merely has to have a value that lies between the means of the two sets. And the sets themselves can have overlapping ranges. Consider this example:

If 10, which is larger than R's mean of 7 and smaller than S's mean of 12, is moved from S to R, the arithmetic means still increase slightly, to 7.375 and 12.333.

General application to subpopulation averaging

The phenomenon is a potential risk when comparing averages of subpopulations, because the outcome can change depending on how the population is divided rather than on changes of the individuals of the population. By extension, the average of the subpopulation averages would change even though there is no actual change within the population. [1] This effect can result in non-intuitive or objectionable results in Pareto and related utilitarian analysis. [1]

Cancer stage migration

One real-world example of the phenomenon is seen in the medical concept of cancer stage migration, [4] which led clinician Alvan Feinstein to coin the term Will Rogers phenomenon in 1985, based on a remark by a friend who attributed the quote to Rogers. [5]

In medical stage migration, improved detection of illness leads to the movement of people from the set of healthy people to the set of unhealthy people. Because these people are actually not healthy — merely misclassified as healthy due to an imperfect earlier diagnosis — removing them from the set of healthy people increases the average lifespan of the healthy group. Likewise, the migrated people are healthier than the people already in the unhealthy set: their illness was so minor that only the newer more sensitive test could detect it. Adding them to the unhealthy raises the average lifespan of that group as well. Both lifespans are statistically lengthened, even if early detection of a cancer does not lead to better treatment: because it is detected earlier, more time is lived in the "unhealthy" set of people. In this form, the paradox can be viewed an instance of the equivocation fallacy.[ citation needed ] Equivocation occurs when one term is used with multiple meanings in order to mislead the listener into unwarranted comparisons, and life span statistics before and after a stage migration use different meanings of "unhealthy", as the cutoff for detection is different.

See also

Related Research Articles

In mathematics and statistics, the arithmetic mean, arithmetic average, or just the mean or average is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results from an experiment, an observational study, or a survey. The term "arithmetic mean" is preferred in some mathematics and statistics contexts because it helps distinguish it from other types of means, such as geometric and harmonic.

In computability theory, the Ackermann function, named after Wilhelm Ackermann, is one of the simplest and earliest-discovered examples of a total computable function that is not primitive recursive. All primitive recursive functions are total and computable, but the Ackermann function illustrates that not all total computable functions are primitive recursive.

<span class="mw-page-title-main">Flynn effect</span> 20th-century rise in intelligence test scores

The Flynn effect is the substantial and long-sustained increase in both fluid and crystallized intelligence test scores that were measured in many parts of the world over the 20th century, named after researcher James Flynn (1934–2020). When intelligence quotient (IQ) tests are initially standardized using a sample of test-takers, by convention the average of the test results is set to 100 and their standard deviation is set to 15 or 16 IQ points. When IQ tests are revised, they are again standardized using a new sample of test-takers, usually born more recently than the first; the average result is set to 100. When the new test subjects take the older tests, in almost every case their average scores are significantly above 100.

<span class="mw-page-title-main">Geometric mean</span> N-th root of the product of n numbers

In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite set of positive real numbers by using the product of their values. The geometric mean is defined as the nth root of the product of n numbers, i.e., for a set of numbers a1, a2, ..., an, the geometric mean is defined as

A mean is a numeric quantity representing the center of a collection of numbers and is intermediate to the extreme values of a set of numbers. There are several kinds of means in mathematics, especially in statistics. Each attempts to summarize or typify a given group of data, illustrating the magnitude and sign of the data set. Which of these measures is most illuminating depends on what is being measured, and on context and purpose.

In computability theory, a primitive recursive function is, roughly speaking, a function that can be computed by a computer program whose loops are all "for" loops. Primitive recursive functions form a strict subset of those general recursive functions that are also total functions.

<span class="mw-page-title-main">Standard deviation</span> In statistics, a measure of variation

In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not.

<span class="mw-page-title-main">Simpson's paradox</span> Error in statistical reasoning with groups

Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling.

<span class="mw-page-title-main">Average</span> Number taken as representative of a list of numbers

In ordinary language, an average is a single number or value that best represents a set of data. The type of average taken as most typically representative of a list of numbers is the arithmetic mean – the sum of the numbers divided by how many numbers are in the list. For example, the mean average of the numbers 2, 3, 4, 7, and 9 is 5. Depending on the context, the most representative statistic to be taken as the average might be another measure of central tendency, such as the mid-range, median, mode or geometric mean. For example, the average personal income is often given as the median – the number below which are 50% of personal incomes and above which are 50% of personal incomes – because the mean would be higher by including personal incomes from a few billionaires. For this reason, it is recommended to avoid using the word "average" when discussing measures of central tendency and specify which type of measure of average is being used.

In mathematics, the root mean square of a set of numbers is the square root of the set's mean square. Given a set , its RMS is denoted as either or . The RMS is also known as the quadratic mean, a special case of the generalized mean. The RMS of a continuous function is denoted and can be defined in terms of an integral of the square of the function.

In mathematics, the distributive property of binary operations is a generalization of the distributive law, which asserts that the equality is always true in elementary algebra. For example, in elementary arithmetic, one has Therefore, one would say that multiplication distributes over addition.

In medicine and health-related fields, a reference range or reference interval is the range or the interval of values that is deemed normal for a physiological measurement in healthy persons. It is a basis for comparison for a physician or other health professional to interpret a set of test results for a particular patient. Some important reference ranges in medicine are reference ranges for blood tests and reference ranges for urine tests.

<span class="mw-page-title-main">George Boolos</span> American philosopher and mathematical logician

George Stephen Boolos was an American philosopher and a mathematical logician who taught at the Massachusetts Institute of Technology.

In the theory of computation, the Sudan function is an example of a function that is recursive, but not primitive recursive. This is also true of the better-known Ackermann function.

The Hispanic paradox is an epidemiological finding that Hispanic Americans tend to have health outcomes that "paradoxically" are comparable to, or in some cases better than, those of their U.S. non-Hispanic White counterparts, even though Hispanics have lower average income and education, higher rates of disability, as well as a higher incidence of various cardiovascular risk factors and metabolic diseases.

<span class="mw-page-title-main">Peridynamics</span>

Peridynamics is a non-local formulation of continuum mechanics that is oriented toward deformations with discontinuities, especially fractures. Originally, bond-based peridynamic has been introduced, wherein, internal interaction forces between a material point and all the other ones with which it can interact, are modeled as a central forces field. This type of force fields can be imagined as a mesh of bonds connecting each point of the body with every other interacting point within a certain distance which depends on material property, called peridynamic horizon. Later, to overcome bond-based framework limitations for the material Poisson’s ratio, state-base peridynamics, has been formulated. Its characteristic feature is that the force exchanged between a point and another one is influenced by the deformation state of all other bonds relative to its interaction zone.

In arithmetic, a complex-base system is a positional numeral system whose radix is an imaginary or complex number.

In social psychology, illusory superiority is a cognitive bias wherein people overestimate their own qualities and abilities compared to others. Illusory superiority is one of many positive illusions, relating to the self, that are evident in the study of intelligence, the effective performance of tasks and tests, and the possession of desirable personal characteristics and personality traits. Overestimation of abilities compared to an objective measure is known as the overconfidence effect.

In mathematics, the hyperoperation sequence is an infinite sequence of arithmetic operations (called hyperoperations in this context) that starts with a unary operation (the successor function with n = 0). The sequence continues with the binary operations of addition (n = 1), multiplication (n = 2), and exponentiation (n = 3).

Type theory with records is a formal semantics representation framework, using records to express type theory types. It has been used in natural language processing, principally computational semantics and dialogue systems.

References

  1. 1 2 3 Tarsney, Christian; Geruso, Michael; Spears, Dean (2023). "Egyptians, Aliens, and Okies: Against the Sum of Averages". Utilitas . 35 (4): 320–326. doi:10.1017/S0953820823000225. what statisticians call the "Okie paradox", named for Will Rogers' joke
  2. Feinstein AR, Sosin DM, Wells CK (June 1985). "The Will Rogers phenomenon. Stage migration and new diagnostic techniques as a source of misleading statistics for survival in cancer". The New England Journal of Medicine. 312 (25): 1604–8. doi:10.1056/NEJM198506203122504. PMID   4000199.
  3. O'Toole, Garson (16 October 2021). "When the Okies Migrated To California, It Raised the I.Q. in Both States — Quote Investigator®". Quote Investigator. Retrieved 9 September 2024.{{cite web}}: CS1 maint: url-status (link)
  4. Sormani, M. P.; Tintorè, M.; Rovaris, M.; Rovira, A.; Vidal, X.; Bruzzi, P.; Filippi, M.; Montalban, X. (2008). "Will Rogers phenomenon in multiple sclerosis". Annals of Neurology. 64 (4): 428–433. doi: 10.1002/ana.21464 . PMID   18688811. S2CID   25960476.
  5. Singer, Richard B. (1994). Medical Risks: 1991 Compend of Mortality and Morbidity. Greenwood Publishing Group. p. 76. ISBN   978-0-275-94553-4.