GRIM test

Last updated

The granularity-related inconsistency of means (GRIM) test is a simple statistical test used to identify inconsistencies in the analysis of data sets. The test relies on the fact that, given a dataset containing N integer values, the arithmetic mean (commonly called simply the average) is restricted to a few possible values: it must always be expressible as a fraction with an integer numerator and a denominator  N. If the reported mean does not fit this description, there must be an error somewhere; the preferred term for such errors is "inconsistencies", to emphasise that their origin is, on first discovery, typically unknown. GRIM inconsistencies can result from inadvertent data-entry or typographical errors or from scientific fraud. The GRIM test is most useful in fields such as psychology where researchers typically use small groups and measurements are often integers. The GRIM test was proposed by Nick Brown and James Heathers in 2016, following increased awareness of the replication crisis in some fields of science. [1]

Contents

Procedure

The GRIM test is straightforward to perform. For each reported mean in a paper, the sample size (N) is found, and all fractions with denominator N are calculated. The mean is then checked against this list (being aware of the fact that values may be rounded inconsistently: depending on the context, a mean of 1.125 may be reported as 1.12 or 1.13). If the mean is not in this list, it is highlighted as mathematically impossible. [2] [3]

Example

Consider an experiment in which a fair dice is rolled 20 times. Each roll will produce one whole number between 1 and 6, and the hypothesized mean value is 3.5. The results of the rolls are then averaged together, and the mean is reported as 3.48. This is close to the expected value, and appears to support the hypothesis. However, a GRIM test reveals that the reported mean is mathematically impossible: the result of dividing any whole number by 20, written to 2 decimal places, must be of the form X.X0 or X.X5; it is impossible to divide any integer by 20 and produce a result with an "8" in the second decimal place. [4]

Interpretation and limitations

Even if the data fails the GRIM test, this is not automatically a sign of manipulation. Errors in the mean can come about innocently as a result of an error on the part of the tester, typographical errors, calculation and programming mistakes, or improper reporting of the sample size. [2] However, it can be a sign that some data has been improperly excluded or that the mean has been illegitimately fudged in order to make the results appear more significant. The location of failures can be indicative of the underlying cause: an isolated impossible mean may be caused by an error, multiple impossible values in the same row of a table indicate a poor response rate, and multiple impossible values in the same column indicate the given sample size is incorrect. Multiple errors scattered throughout a table can be a sign of deeper problems, and other statistical tests can be used to analyze the suspect data. [5]

The GRIM test works best with data sets in which: the sample size is relatively small, the number of subcomponents in composite measures is also small, and the mean is reported to multiple decimal places. [2] In some cases, a valid mean may appear to fail the test if the input data is not discretized as expected – for example, if people are asked how many slices of pizza they ate at a buffet, some people may respond with a fraction such as "three and a half" instead of a whole number as expected. [5]

Applications

Brown and Heathers applied the test to 260 articles published in Psychological Science , Journal of Experimental Psychology: General , and Journal of Personality and Social Psychology . Of these articles, 71 were amenable to GRIM test analysis; 36 of these contained at least one impossible value and 16 contained multiple impossible values. [3]

GRIM testing also played a significant role in uncovering errors in publications by Cornell University's Food and Brand Lab under Brian Wansink. GRIM testing revealed that a series of articles on the effect of price on consumption at an all-you-can-eat pizza buffet contained many impossible means – deeper analysis of the raw data revealed that in many cases, sample sizes were incorrectly stated and values incorrectly calculated. [1] [5]

Related Research Articles

Analysis of variance (ANOVA) is a collection of statistical models and their associated estimation procedures used to analyze the differences among means. ANOVA was developed by the statistician Ronald Fisher. ANOVA is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. In its simplest form, ANOVA provides a statistical test of whether two or more population means are equal, and therefore generalizes the t-test beyond two means. In other words, the ANOVA is used to test the difference between two or more means.

<span class="mw-page-title-main">Decimal</span> Number in base-10 numeral system

The decimal numeral system is the standard system for denoting integer and non-integer numbers. It is the extension to non-integer numbers of the Hindu–Arabic numeral system. The way of denoting numbers in the decimal system is often referred to as decimal notation.

<span class="mw-page-title-main">Number</span> Used to count, measure, and label

A number is a mathematical object used to count, measure, and label. The most basic examples are the natural numbers 1, 2, 3, 4, and so forth. Numbers can be represented in language with number words. More universally, individual numbers can be represented by symbols, called numerals; for example, "5" is a numeral that represents the number five. As only a relatively small number of symbols can be memorized, basic numerals are commonly organized in a numeral system, which is an organized way to represent any number. The most common numeral system is the Hindu–Arabic numeral system, which allows for the representation of any non-negative integer using a combination of ten fundamental numeric symbols, called digits. In addition to their use in counting and measuring, numerals are often used for labels, for ordering, and for codes. In common usage, a numeral is not clearly distinguished from the number that it represents.

<span class="mw-page-title-main">Statistical hypothesis test</span> Method of statistical inference

A statistical hypothesis test is a method of statistical inference used to decide whether the data sufficiently support a particular hypothesis. A statistical hypothesis test typically involves a calculation of a test statistic. Then a decision is made, either by comparing the test statistic to a critical value or equivalently by evaluating a p-value computed from the test statistic. Roughly 100 specialized statistical tests have been defined.

Accuracy and precision are two measures of observational error.

A computer number format is the internal representation of numeric values in digital device hardware and software, such as in programmable computers and calculators. Numerical values are stored as groupings of bits, such as bytes and words. The encoding between numerical values and bit patterns is chosen for convenience of the operation of the computer; the encoding used by the computer's instruction set generally requires conversion for external use, such as for printing and display. Different types of processors may have different internal representations of numerical values and different conventions are used for integer and real numbers. Most calculations are carried out with number formats that fit into a processor register, but some software systems allow representation of arbitrarily large numbers using multiple words of memory.

<span class="mw-page-title-main">Division (mathematics)</span> Arithmetic operation

Division is one of the four basic operations of arithmetic. The other operations are addition, subtraction, and multiplication. What is being divided is called the dividend, which is divided by the divisor, and the result is called the quotient.

Sexagesimal, also known as base 60, is a numeral system with sixty as its base. It originated with the ancient Sumerians in the 3rd millennium BC, was passed down to the ancient Babylonians, and is still used—in a modified form—for measuring time, angles, and geographic coordinates.

<span class="mw-page-title-main">Outlier</span> Observation far apart from others in statistics and data science

In statistics, an outlier is a data point that differs significantly from other observations. An outlier may be due to a variability in the measurement, an indication of novel data, or it may be the result of experimental error; the latter are sometimes excluded from the data set. An outlier can be an indication of exciting possibility, but can also cause serious problems in statistical analyses.

<span class="mw-page-title-main">Rounding</span> Replacing a number with a simpler value

Rounding or rounding off means replacing a number with an approximate value that has a shorter, simpler, or more explicit representation. For example, replacing $23.4476 with $23.45, the fraction 312/937 with 1/3, or the expression √2 with 1.414.

In computing, fixed-point is a method of representing fractional (non-integer) numbers by storing a fixed number of digits of their fractional part. Dollar amounts, for example, are often stored with exactly two fractional digits, representing the cents. More generally, the term may refer to representing fractional values as integer multiples of some fixed small unit, e.g. a fractional amount of hours as an integer multiple of ten-minute intervals. Fixed-point number representation is often contrasted to the more complicated and computationally demanding floating-point representation.

<span class="mw-page-title-main">Numerical tower</span> Set of data types that represent numbers in a given programming language

In Scheme and in Lisp dialects inspired by it, the numerical tower is a set of data types that represent numbers and a logic for their hierarchical organisation.

<span class="mw-page-title-main">Fraction</span> Ratio of two numbers

A fraction represents a part of a whole or, more generally, any number of equal parts. When spoken in everyday English, a fraction describes how many parts of a certain size there are, for example, one-half, eight-fifths, three-quarters. A common, vulgar, or simple fraction consists of an integer numerator, displayed above a line, and a non-zero integer denominator, displayed below that line. If these integers are positive, then the numerator represents a number of equal parts, and the denominator indicates how many of those parts make up a unit or a whole. For example, in the fraction 3/4, the numerator 3 indicates that the fraction represents 3 equal parts, and the denominator 4 indicates that 4 parts make up a whole. The picture to the right illustrates 3/4 of a cake.

<span class="mw-page-title-main">Structural equation modeling</span> Form of causal modeling that fit networks of constructs to data

Structural equation modeling (SEM) is a diverse set of methods used by scientists doing both observational and experimental research. SEM is used mostly in the social and behavioral sciences but it is also used in epidemiology, business, and other fields. A definition of SEM is difficult without reference to technical language, but a good starting place is the name itself.

<span class="mw-page-title-main">Finger binary</span> Finger-counting system

Finger binary is a system for counting and displaying binary numbers on the fingers of either or both hands. Each finger represents one binary digit or bit. This allows counting from zero to 31 using the fingers of one hand, or 1023 using both: that is, up to 25−1 or 210−1 respectively.

In statistics, the number of degrees of freedom is the number of values in the final calculation of a statistic that are free to vary.

Arithmetic is an elementary branch of mathematics that is widely used for tasks ranging from simple day-to-day counting to advanced science and business calculations.

Some programming languages provide a built-in (primitive) rational data type to represent rational numbers like 1/3 and -11/17 without rounding, and to do arithmetic on them. Examples are the ratio type of Common Lisp, and analogous types provided by most languages for algebraic computation, such as Mathematica and Maple. Many languages that do not have a built-in rational type still provide it as a library-defined type.

Some programming languages provide a built-in (primitive) or library decimal data type to represent non-repeating decimal fractions like 0.3 and -1.17 without rounding, and to do arithmetic on them. Examples are the decimal.Decimal type of Python, and analogous types provided by other languages.

Statcheck is an R package designed to detect statistical errors in peer-reviewed psychology articles by searching papers for statistical results, redoing the calculations described in each paper, and comparing the two values to see if they match. It takes advantage of the fact that psychological research papers tend to report their results in accordance with the guidelines published by the American Psychological Association (APA). This leads to several disadvantages: it can only detect results reported completely and in exact accordance with the APA's guidelines, and it cannot detect statistics that are only included in tables in the paper. Another limitation is that Statcheck cannot deal with statistical corrections to test statistics, like Greenhouse–Geisser or Bonferroni corrections, which actually make tests more conservative. Some journals have begun piloting Statcheck as part of their peer review process. Statcheck is free software published under the GNU GPL v3.

References

  1. 1 2 Bartlett, Tom (17 March 2017). "Spoiled Science". The Chronicle of Higher Education . Retrieved 19 October 2017.
  2. 1 2 3 Heathers, James (23 May 2016). "The GRIM test—a method for evaluating published research". Medium . Retrieved 19 October 2017.
  3. 1 2 Brown, Nicholas J. L.; Heathers, James A. J. (18 October 2016). "The GRIM Test: A Simple Technique Detects Numerous Anomalies in the Reporting of Results in Psychology" (PDF). Social Psychological and Personality Science . 8 (4): 363–369. doi:10.1177/1948550616673876. S2CID   35828029. Archived from the original (PDF) on 30 December 2021. Retrieved 18 September 2019.
  4. Omnes Res. "GRIM Plot (mean: 3.48, size: 20)". PrePubMed. Retrieved 19 October 2017.
  5. 1 2 3 Anaya, Jordan; van der Zee, Tim; Brown, Nick (14 June 2017). "Statistical infarction: A postmortem of the Cornell Food and Brand Lab pizza publications". PeerJ Preprints . doi: 10.7287/peerj.preprints.3025v1 . Retrieved 19 October 2017.