Last updated

Stanine (STAndard NINE) is a method of scaling test scores on a nine-point standard scale with a mean of five and a standard deviation of two.


Some web sources attribute stanines to the U.S. Army Air Forces during World War II. Psychometric legend has it that a 1-9 scale was used because of the compactness of recording the score as a single digit but Thorndike [1] claims that by reducing scores to just nine values, stanines "reduce the tendency to try to interpret small score differences (p. 131)". The earliest known use of stanines was by the U.S. Army Air Forces in 1943.[ citation needed ]


Test scores are scaled to stanine scores using the following algorithm:

  1. Rank results from lowest to highest
  2. Give the lowest 4% a stanine of 1, the next 7% a stanine of 2, etc., according to the following table:
Calculating Stanines
below 1.751.75 to 1.251.25 to 0.750.75 to 0.250.25 to +0.25+0.25 to +0.75+0.75 to +1.25+1.25 to +1.75above +1.75
scale score
below 7474 to 8181 to 8989 to 9696 to 104104 to 111111 to 119119 to 126above 126

The underlying basis for obtaining stanines is that a normal distribution is divided into nine intervals, each of which has a width of 0.5 standard deviations excluding the first and last, which are just the remainder (the tails of the distribution). The mean lies at the centre of the fifth interval.

Use today

Today stanines are mostly used in educational assessment.[ citation needed ]

See also


  1. Thorndike, R. L. (1982). Applied Psychometrics. Boston, MA: Houghton Mifflin
  2. "Archived copy". Archived from the original on 2006-12-12. Retrieved 2007-01-04.CS1 maint: archived copy as title (link)
  3. "Understanding Stanines",
  4. "GL Assessment"

Related Research Articles

Interquartile range measure of statistical dispersion

In descriptive statistics, the interquartile range (IQR), also called the midspread, middle 50%, or H‑spread, is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles, IQR = Q3 − Q1. In other words, the IQR is the first quartile subtracted from the third quartile; these quartiles can be clearly seen on a box plot on the data. It is a trimmed estimator, defined as the 25% trimmed range, and is a commonly used robust measure of scale.

Quantile cutpoint dividing a set of observations into equal sized groups

In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created. Thus quartiles are the three cut points that will divide a dataset into four equal-sized groups. Common quantiles have special names: for instance quartile, decile. The groups created are termed halves, thirds, quarters, etc., though sometimes the terms for the quantile are used for the groups created, rather than for the cut points.

Standard score how many standard deviations apart from the mean an observed datum is

In statistics, the standard score is the number of standard deviations by which the value of a raw score is above or below the mean value of what is being observed or measured. Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.

In statistics, a frequency distribution is a list, table or graph that displays the frequency of various outcomes in a sample. Each entry in the table contains the frequency or count of the occurrences of values within a particular group or interval.

In health-related fields, a reference range or reference interval is the range of values that is deemed normal for a physiologic measurement in healthy persons. It is a basis for comparison for a physician or other health professional to interpret a set of test results for a particular patient. Some important reference ranges in medicine are reference ranges for blood tests and reference ranges for urine tests.

The Stanford–Binet Intelligence Scales is an individually administered intelligence test that was revised from the original Binet–Simon Scale by Lewis Terman, a psychologist at Stanford University. The Stanford–Binet Intelligence Scale is now in its fifth edition (SB5) and was released in 2003. It is a cognitive ability and intelligence test that is used to diagnose developmental or intellectual deficiencies in young children. The test measures five weighted factors and consists of both verbal and nonverbal subtests. The five factors being tested are knowledge, quantitative reasoning, visual-spatial processing, working memory, and fluid reasoning.

Louis Leon Thurstone was a U.S. pioneer in the fields of psychometrics and psychophysics. He conceived the approach to measurement known as the law of comparative judgment, and is well known for his contributions to factor analysis. A Review of General Psychology survey, published in 2002, ranked Thurstone as the 88th most cited psychologist of the 20th century, tied with John Garcia, James J. Gibson, David Rumelhart, Margaret Floy Washburn, and Robert S. Woodworth.

In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which a future observation will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis.

In probability theory and statistics, the coefficient of variation (CV), also known as relative standard deviation (RSD), is a standardized measure of dispersion of a probability distribution or frequency distribution. It is often expressed as a percentage, and is defined as the ratio of the standard deviation to the mean . The CV or RSD is widely used in analytical chemistry to express the precision and repeatability of an assay. It is also commonly used in fields such as engineering or physics when doing quality assurance studies and ANOVA gauge R&R. In addition, CV is utilized by economists and investors in economic models.

In education, relative grading, marking on a curve (BE) or grading on a curve is a method of assigning grades to the students in a class in such a way as to obtain or approach a pre-specified distribution of these grades having a specific mean and derivation properties, such as a normal distribution. The term "curve" refers to the bell curve, the graphical representation of the probability density of the normal distribution, but this method can be used to achieve any desired distribution of the grades – for example, a uniform distribution.

In statistics and applications of statistics, normalization can have a range of meanings. In the simplest cases, normalization of ratings means adjusting values measured on different scales to a notionally common scale, often prior to averaging. In more complicated cases, normalization may refer to more sophisticated adjustments where the intention is to bring the entire probability distributions of adjusted values into alignment. In the case of normalization of scores in educational assessment, there may be an intention to align distributions to a normal distribution. A different approach to normalization of probability distributions is quantile normalization, where the quantiles of the different measures are brought into alignment.

A norm-referenced test (NRT) is a type of test, assessment, or evaluation which yields an estimate of the position of the tested individual in a predefined population, with respect to the trait being measured. The estimate is derived from the analysis of test scores and possibly other relevant data from a sample drawn from the population. That is, this type of test identifies whether the test taker performed better or worse than other test takers, not whether the test taker knows either more or less material than is necessary for a given purpose.

In educational statistics, a normal curve equivalent (NCE), developed for the United States Department of Education by the RMC Research Corporation, is a way of standardizing scores received on a test into a 0-100 scale similar to a percentile-rank, but preserving the valuable equal-interval properties of a z-score. It is defined as:

Germany uses a 5- or 6-point grading scale (GPA) to evaluate academic performance for the youngest to the oldest students. Grades vary from 1 to 5. In the final classes of German Gymnasium schools that prepare for university studies, a point system is used with 15 points being the best grade and 0 points the worst. The percentage causing the grade can vary from teacher to teacher.

Test equating traditionally refers to the statistical process of determining comparable scores on different forms of an exam. It can be accomplished using either classical test theory or item response theory.

68–95–99.7 rule Shorthand used in statistics

In statistics, the 68–95–99.7 rule, also known as the empirical rule, is a shorthand used to remember the percentage of values that lie within a band around the mean in a normal distribution with a width of two, four and six standard deviations, respectively; more precisely, 68.27%, 95.45% and 99.73% of the values lie within one, two and three standard deviations of the mean, respectively.

The results for some scales of some psychometric instruments are returned as sten scores, sten being an abbreviation for 'Standard Ten' and thus closely related to stanine scores.

1.96 approximate value of the normal distribution

In probability and statistics, 1.96 is the approximate value of the 97.5 percentile point of the standard normal distribution. 95% of the area under a normal curve lies within roughly 1.96 standard deviations of the mean, and due to the central limit theorem, this number is therefore used in the construction of approximate 95% confidence intervals. Its ubiquity is due to the arbitrary but common convention of using confidence intervals with 95% coverage rather than other coverages. This convention seems particularly common in medical statistics, but is also common in other areas of application, such as earth sciences, social sciences and business research.

In statistics, a trimmed estimator is an estimator derived from another estimator by excluding some of the extreme values, a process called truncation. This is generally done to obtain a more robust statistic, and the extreme values are considered outliers. Trimmed estimators also often have higher efficiency for mixture distributions and heavy-tailed distributions than the corresponding untrimmed estimator, at the cost of lower efficiency for other distributions, such as the normal distribution.



Here is a link to the document: