Stanine

Last updated April 30, 2024

Stanine (STAndard NINE) is a method of scaling test scores on a nine-point standard scale with a mean of five and a standard deviation of two.

Some web sources attribute stanines to the U.S. Army Air Forces during World War II. Psychometric legend has it that a 1–9 scale was used because of the compactness of recording the score as a single digit but Thorndike^[1] claims that by reducing scores to just nine values, stanines "reduce the tendency to try to interpret small score differences (p. 131)". The earliest known use of stanines was by the U.S. Army Air Forces in 1942.^[2]

Calculation

Test scores are scaled to stanine scores using the following algorithm:

Rank results from lowest to highest
Give the lowest 4% a stanine of 1, the next 7% a stanine of 2, etc., according to the following table:

Calculating Stanines
Bracketed proportion	4%	7%	12%	17%	20%	17%	12%	7%	4%
Stanine	1	2	3	4	5	6	7	8	9
Standardized score	below −1.75	−1.75 to −1.25	−1.25 to −0.75	−0.75 to −0.25	−0.25 to +0.25	+0.25 to +0.75	+0.75 to +1.25	+1.25 to +1.75	above +1.75
Wechsler scale score	below 74	74 to 81	81 to 89	89 to 96	96 to 104	104 to 111	111 to 119	119 to 126	above 126
Cumulative proportion	4%	11%	23%	40%	60%	77%	89%	96%	100%

The underlying basis for obtaining stanines is that a normal distribution is divided into nine intervals, each of which has a width of 0.5 standard deviations excluding the first and last, which are just the remainder (the tails of the distribution). The median lies at the centre of the fifth interval.

Use today

Today stanines are mostly used in educational assessment.^{[ citation needed ]}

The University of Alberta in Edmonton, Alberta, Canada used the stanine system until 2003, when it switched to a 4-point scale.^[3]
In the United States, the Educational Records Bureau (they administer the "ERBs") reports test scores as stanines and percentiles.
The New Zealand Council for Educational Research uses stanines.^[4]
GL Assessment use stanines alongside SAS (Standardised Age Scores) to express the results of its CAT4 assessments, used in many UK and British international schools ^[5]
The Otis-Lennon School Ability Test uses a stanine system along with percentiles.
High schools in Korea use a stanine system to evaluate their students.
The IDF (Israeli Defense Force) uses the stanine grading system ranging from 10 to 90 (10,20,30 and so on) to rank intelligence ability relevant to the army's use, determined by a 100 question test divided to 4 categories having to do with different uses and implications of cognitive abilities
The Polish Matura secondary-school exam results and university admissions utilise the stanine system

Notes

↑ Thorndike, R. L. (1982). Applied Psychometrics. Boston, MA: Houghton Mifflin
↑ Krueger Hussey, A. (2004). Air Force Flight Screening: Evolutionary Changes, 1917-2003. Office of History and Research Headquarters Air Education and Training Command Randolph AFB, Texas
↑ "Grade Comparison Guide". Archived from the original on 2006-12-12. Retrieved 2007-01-04.
↑ "Understanding Stanines", nzcersupport.org.nz
↑ "GL Assessment"

Related Research Articles

Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally covers specialized fields within psychology and education devoted to testing, measurement, assessment, and related activities. Psychometrics is concerned with the objective measurement of latent constructs that cannot be directly observed. Examples of latent constructs include intelligence, introversion, mental disorders, and educational achievement. The levels of individuals on nonobservable latent variables are inferred through mathematical modeling based on what is observed from individuals' responses to items on tests and scales.

Psychological testing refers to the administration of psychological tests. Psychological tests are administered or scored by trained evaluators. A person's responses are evaluated according to carefully prescribed guidelines. Scores are thought to reflect individual or group differences in the construct the test purports to measure. The science behind psychological testing is psychometrics.

In statistics, the standard score is the number of standard deviations by which the value of a raw score is above or below the mean value of what is being observed or measured. Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.

In medicine and health-related fields, a reference range or reference interval is the range or the interval of values that is deemed normal for a physiological measurement in healthy persons. It is a basis for comparison for a physician or other health professional to interpret a set of test results for a particular patient. Some important reference ranges in medicine are reference ranges for blood tests and reference ranges for urine tests.

The Stanford–Binet Intelligence Scales is an individually administered intelligence test that was revised from the original Binet–Simon Scale by Alfred Binet and Théodore Simon. It is in its fifth edition (SB5), which was released in 2003.

Joy Paul Guilford was an American psychologist best remembered for his psychometric study of human intelligence, including the distinction between convergent and divergent production.

The Miller Analogies Test (MAT) is a standardized test used both for graduate school admissions in the United States and entrance to high I.Q. societies. Created and still published by Harcourt Assessment, the MAT consists of 120 questions in 60 minutes. Unlike other graduate school admissions exams such as the GRE, the Miller Analogies Test is verbal or computer based.

In statistics and applications of statistics, normalization can have a range of meanings. In the simplest cases, normalization of ratings means adjusting values measured on different scales to a notionally common scale, often prior to averaging. In more complicated cases, normalization may refer to more sophisticated adjustments where the intention is to bring the entire probability distributions of adjusted values into alignment. In the case of normalization of scores in educational assessment, there may be an intention to align distributions to a normal distribution. A different approach to normalization of probability distributions is quantile normalization, where the quantiles of the different measures are brought into alignment.

The School and College Ability Test (SCAT), is a standardized test conducted in the United States that measures math and verbal reasoning abilities in gifted children.

A norm-referenced test (NRT) is a type of test, assessment, or evaluation which yields an estimate of the position of the tested individual in a predefined population, with respect to the trait being measured. Assigning scores on such tests may be described as relative grading, marking on a curve (BrE) or grading on a curve. It is a method of assigning grades to the students in a class in such a way as to obtain or approach a pre-specified distribution of these grades having a specific mean and derivation properties, such as a normal distribution. The term "curve" refers to the bell curve, the graphical representation of the probability density of the normal distribution, but this method can be used to achieve any desired distribution of the grades – for example, a uniform distribution. The estimate is derived from the analysis of test scores and possibly other relevant data from a sample drawn from the population. That is, this type of test identifies whether the test taker performed better or worse than other test takers, not whether the test taker knows either more or less material than is necessary for a given purpose. The term normative assessment is used when the reference population are the peers of the test taker.

The Culture Fair Intelligence Test (CFIT) was created by Raymond Cattell in 1949 as an attempt to measure cognitive abilities devoid of sociocultural and environmental influences. Scholars have subsequently concluded that the attempt to construct measures of cognitive abilities devoid of the influences of experiential and cultural conditioning is a challenging one. Cattell proposed that general intelligence (g) comprises both fluid intelligence (Gf) and crystallized intelligence (Gc). Whereas Gf is biologically and constitutionally based, Gc is the actual level of a person's cognitive functioning, based on the augmentation of Gf through sociocultural and experiential learning.

In educational statistics, a normal curve equivalent (NCE), developed for the United States Department of Education by the RMC Research Corporation, is a way of normalizing scores received on a test into a 0-100 scale similar to a percentile rank, but preserving the valuable equal-interval properties of a z-score.

Germany uses a 5- or 6-point grading scale (GPA) to evaluate academic performance for the youngest to the oldest students. Grades vary from 1 to 5. In the final classes of German Gymnasium schools that prepare for university studies, a point system is used with 15 points being the best grade and 0 points the worst. The percentage causing the grade can vary from teacher to teacher.

Test equating traditionally refers to the statistical process of determining comparable scores on different forms of an exam. It can be accomplished using either classical test theory or item response theory.

<span class="mw-page-title-main">IQ classification</span> Categorisation of peoples intelligence based on IQ

IQ classification is the practice of categorizing human intelligence, as measured by intelligence quotient (IQ) tests, into categories such as "superior" or "average".

The results for some scales of some psychometric instruments are returned as sten scores, sten being an abbreviation for 'Standard Ten' and thus closely related to stanine scores.

The Cognitive Abilities Test (CogAT) is a group-administered K–12 assessment published by Riverside Insights and intended to estimate students' learned reasoning and problem solving abilities through a battery of verbal, quantitative, and nonverbal test items. The test purports to assess students' acquired reasoning abilities while also predicting achievement scores when administered with the co-normed Iowa Tests. The test was originally published in 1954 as the Lorge-Thorndike Intelligence Test, after the psychologists who authored the first version of it, Irving Lorge and Robert L. Thorndike. The CogAT is one of several tests used in the United States to help teachers or other school staff make student placement decisions for gifted education programs, and is accepted for admission to Intertel, a high IQ society for those who score at or above the 99th percentile on a test of intelligence.

The Naglieri Nonverbal Ability Test (NNAT) is a nonverbal measure of general ability designed by Jack A. Naglieri and published by Pearson Education. The Naglieri Nonverbal Ability Test - Individual Form was first published in 1998. Two versions were published in 2007 and 2008, respectively. This includes the group administered Naglieri Nonverbal Ability Test - Second Edition and the Naglieri Nonverbal Ability Test - Online version. The most current version is NNAT3. Like all nonverbal ability tests, the NNAT is intended to assess cognitive ability independently of linguistic and cultural background.

A test score is a piece of information, usually a number, that conveys the performance of an examinee on a test. One formal definition is that it is "a summary of the evidence contained in an examinee's responses to the items of a test that are related to the construct or constructs being measured."

The Child Behavior Checklist (CBCL) is a widely used caregiver report form identifying problem behavior in children. It is widely used in both research and clinical practice with youths. It has been translated into more than 90 languages, and normative data are available integrating information from multiple societies. Because a core set of the items have been included in every version of the CBCL since the 1980s, it provides a meter stick for measuring whether amounts of behavior problems have changed over time or across societies. This is a helpful complement to other approaches for looking at rates of mental-health issues, as the definitions of disorders have changed repeatedly over the same time frame.

References

Ballew, Pat (c. 2002). "Origins of some arithmetic terms". Math words. pballew.net. p. 3. Archived from the original on 2016-06-05. Retrieved 26 December 2004.
Boydsten, Robert E. (27 February 2000). "Winning My Wings". boydstonfoundation.org. Archived from the original on 2008-05-14.

Craven, Wesley Frank; Cate, James Lea, eds. (1983) [1955]. Men and Planes (PDF). The Army Air Forces in World War II. Vol. 6 (new imprint ed.). Chicago, IL: University of Chicago Press (1955) / Office of Air Force History (1983). ISBN 0-912799-03-X. LCCN 48-3657. AFD-101105-019. Archived from the original (PDF) on 2016-11-23 – via afhso.af.mil.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[1] Thorndike, R. L. (1982). Applied Psychometrics. Boston, MA: Houghton Mifflin

[2] Krueger Hussey, A. (2004). Air Force Flight Screening: Evolutionary Changes, 1917-2003. Office of History and Research Headquarters Air Education and Training Command Randolph AFB, Texas

[3] "Grade Comparison Guide". Archived from the original on 2006-12-12. Retrieved 2007-01-04.

[4] "Understanding Stanines", nzcersupport.org.nz

[5] "GL Assessment"

[1]

[2]

[3]

[4]

[5]