The Will Rogers phenomenon, also rarely called the Okie paradox, [1] is when moving an observation from one group to another increases the average of both groups. It is named after a joke attributed to the comedian Will Rogers about Dust Bowl migration during the Great Depression: [2]
When the Okies left Oklahoma and moved to California, they raised the average intelligence level in both states.
The joke's premise is that Okies (Dust Bowl migrants from Oklahoma) were not as bright as the average Oklahoman, but smarter than the average Californian. This quotation was first attributed to native Oklahoman Rogers decades after his death; versions of the same joke with different places and people circulated at least 15 years before it can be linked to Okies. [3]
The apparent paradox comes from the rise in intelligence of both groups, which makes it seem as though intelligence has been "created." However, the overall population maintains the same average intelligence: moving a person of moderate intelligence out of a high-intelligence group and into a low-intelligence group increases the mean intelligence of both the low- and the high-intelligence groups, while the overall population mean (which is a weighted average of the two groups' intelligence) is unaffected.
Consider the following sets R and S, whose arithmetic mean are 2.5 and 7, respectively.
If 5 is moved from S to R, then the arithmetic mean of R increases to 3, and the arithmetic mean of S increases to 7.5, even though the total set of numbers themselves, and therefore their overall average, have not changed.
Consider this more dramatic example, where the arithmetic means of sets R and S, are 1.5 and 10033, respectively:
If 99 is moved from S to R, then the arithmetic means increase to 34 and 15000. The number 99 is orders of magnitude above 1 and 2, and orders of magnitude below 10000 and 20000.
The element which is moved does not have to be the very lowest or highest of its set; it merely has to have a value that lies between the means of the two sets. And the sets themselves can have overlapping ranges. Consider this example:
If 10, which is larger than R's mean of 7 and smaller than S's mean of 12, is moved from S to R, the arithmetic means still increase slightly, to 7.375 and 12.333.
The phenomenon is a potential risk when comparing averages of subpopulations, because the outcome can change depending on how the population is divided rather than on changes of the individuals of the population. By extension, the average of the subpopulation averages would change even though there is no actual change within the population. [1] This effect can result in non-intuitive or objectionable results in Pareto and related utilitarian analysis. [1]
One real-world example of the phenomenon is seen in the medical concept of cancer stage migration, [4] which led clinician Alvan Feinstein to coin the term Will Rogers phenomenon in 1985, based on a remark by a friend who attributed the quote to Rogers. [5]
In medical stage migration, improved detection of illness leads to the movement of people from the set of healthy people to the set of unhealthy people. Because these people are actually not healthy — merely misclassified as healthy due to an imperfect earlier diagnosis — removing them from the set of healthy people increases the average lifespan of the healthy group. Likewise, the migrated people are healthier than the people already in the unhealthy set: their illness was so minor that only the newer more sensitive test could detect it. Adding them to the unhealthy raises the average lifespan of that group as well. Both lifespans are statistically lengthened, even if early detection of a cancer does not lead to better treatment: because it is detected earlier, more time is lived in the "unhealthy" set of people. In this form, the paradox can be viewed an instance of the equivocation fallacy.[ citation needed ] Equivocation occurs when one term is used with multiple meanings in order to mislead the listener into unwarranted comparisons, and life span statistics before and after a stage migration use different meanings of "unhealthy", as the cutoff for detection is different.
In mathematics and statistics, the arithmetic mean, arithmetic average, or just the mean or average is the sum of a collection of numbers divided by the count of numbers in the collection. The collection is often a set of results from an experiment, an observational study, or a survey. The term "arithmetic mean" is preferred in some mathematics and statistics contexts because it helps distinguish it from other types of means, such as geometric and harmonic.
In computability theory, the Ackermann function, named after Wilhelm Ackermann, is one of the simplest and earliest-discovered examples of a total computable function that is not primitive recursive. All primitive recursive functions are total and computable, but the Ackermann function illustrates that not all total computable functions are primitive recursive.
The Flynn effect is the substantial and long-sustained increase in both fluid and crystallized intelligence test scores that were measured in many parts of the world over the 20th century, named after researcher James Flynn (1934–2020). When intelligence quotient (IQ) tests are initially standardized using a sample of test-takers, by convention the average of the test results is set to 100 and their standard deviation is set to 15 or 16 IQ points. When IQ tests are revised, they are again standardized using a new sample of test-takers, usually born more recently than the first; the average result is set to 100. When the new test subjects take the older tests, in almost every case their average scores are significantly above 100.
In mathematics, the geometric mean is a mean or average which indicates a central tendency of a finite collection of positive real numbers by using the product of their values. The geometric mean of numbers is the nth root of their product, i.e., for a collection of numbers a1, a2, ..., an, the geometric mean is defined as
An intelligence quotient (IQ) is a total score derived from a set of standardized tests or subtests designed to assess human intelligence. Originally, IQ was a score obtained by dividing a person's mental age score, obtained by administering an intelligence test, by the person's chronological age, both expressed in terms of years and months. The resulting fraction (quotient) was multiplied by 100 to obtain the IQ score. For modern IQ tests, the raw score is transformed to a normal distribution with mean 100 and standard deviation 15. This results in approximately two-thirds of the population scoring between IQ 85 and IQ 115 and about 2 percent each above 130 and below 70.
A mean is a numeric quantity representing the "center" of a collection of numbers and is intermediate to the extreme values of the set of numbers. There are several kinds of means in mathematics, especially in statistics. Each attempts to summarize or typify a given group of data, illustrating the magnitude and sign of the data set. Which of these measures is most illuminating depends on what is being measured, and on context and purpose.
In computability theory, a primitive recursive function is, roughly speaking, a function that can be computed by a computer program whose loops are all "for" loops. Primitive recursive functions form a strict subset of those general recursive functions that are also total functions.
In statistics, the standard deviation is a measure of the amount of variation of the values of a variable about its mean. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range. The standard deviation is commonly used in the determination of what constitutes an outlier and what does not.
Simpson's paradox is a phenomenon in probability and statistics in which a trend appears in several groups of data but disappears or reverses when the groups are combined. This result is often encountered in social-science and medical-science statistics, and is particularly problematic when frequency data are unduly given causal interpretations. The paradox can be resolved when confounding variables and causal relations are appropriately addressed in the statistical modeling.
In ordinary language, an average is a single number or value that best represents a set of data. The type of average taken as most typically representative of a list of numbers is the arithmetic mean – the sum of the numbers divided by how many numbers are in the list. For example, the mean average of the numbers 2, 3, 4, 7, and 9 is 5. Depending on the context, the most representative statistic to be taken as the average might be another measure of central tendency, such as the mid-range, median, mode or geometric mean. For example, the average personal income is often given as the median – the number below which are 50% of personal incomes and above which are 50% of personal incomes – because the mean would be higher by including personal incomes from a few billionaires. For this reason, it is recommended to avoid using the word "average" when discussing measures of central tendency and specify which average measure is being used.
In medicine and health-related fields, a reference range or reference interval is the range or the interval of values that is deemed normal for a physiological measurement in healthy persons. It is a basis for comparison for a physician or other health professional to interpret a set of test results for a particular patient. Some important reference ranges in medicine are reference ranges for blood tests and reference ranges for urine tests.
George Stephen Boolos was an American philosopher and a mathematical logician who taught at the Massachusetts Institute of Technology.
In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon. Choosing informative, discriminating, and independent features is crucial to produce effective algorithms for pattern recognition, classification, and regression tasks. Features are usually numeric, but other types such as strings and graphs are used in syntactic pattern recognition, after some pre-processing step such as one-hot encoding. The concept of "features" is related to that of explanatory variables used in statistical techniques such as linear regression.
The Hispanic paradox is an epidemiological finding that Hispanic Americans tend to have health outcomes that "paradoxically" are comparable to, or in some cases better than, those of their U.S. non-Hispanic White counterparts, even though Hispanics have lower average income and education, higher rates of disability, as well as a higher incidence of various cardiovascular risk factors and metabolic diseases.
In pattern recognition, information retrieval, object detection and classification, precision and recall are performance metrics that apply to data retrieved from a collection, corpus or sample space.
In arithmetic, a complex-base system is a positional numeral system whose radix is an imaginary or complex number.
In social psychology, illusory superiority is a cognitive bias wherein people overestimate their own qualities and abilities compared to others. Illusory superiority is one of many positive illusions, relating to the self, that are evident in the study of intelligence, the effective performance of tasks and tests, and the possession of desirable personal characteristics and personality traits. Overestimation of abilities compared to an objective measure is known as the overconfidence effect.
A geometric progression, also known as a geometric sequence, is a mathematical sequence of non-zero numbers where each term after the first is found by multiplying the previous one by a fixed number called the common ratio. For example, the sequence 2, 6, 18, 54, ... is a geometric progression with a common ratio of 3. Similarly 10, 5, 2.5, 1.25, ... is a geometric sequence with a common ratio of 1/2.
In mathematics, the hyperoperation sequence is an infinite sequence of arithmetic operations (called hyperoperations in this context) that starts with a unary operation (the successor function with n = 0). The sequence continues with the binary operations of addition (n = 1), multiplication (n = 2), and exponentiation (n = 3).
Type theory with records is a formal semantics representation framework, using records to express type theory types. It has been used in natural language processing, principally computational semantics and dialogue systems.
what statisticians call the "Okie paradox", named for Will Rogers' joke