Stanine

Last updated

Stanine (STAndard NINE) is a method of scaling test scores on a nine-point standard scale with a mean of five and a standard deviation of two.

Contents

Some web sources attribute stanines to the U.S. Army Air Forces during World War II. Psychometric legend has it that a 1–9 scale was used because of the compactness of recording the score as a single digit but Thorndike [1] claims that by reducing scores to just nine values, stanines "reduce the tendency to try to interpret small score differences (p. 131)". The earliest known use of stanines was by the U.S. Army Air Forces in 1943.[ citation needed ]

Calculation

Test scores are scaled to stanine scores using the following algorithm:

  1. Rank results from lowest to highest
  2. Give the lowest 4% a stanine of 1, the next 7% a stanine of 2, etc., according to the following table:
Calculating Stanines
Bracketed
proportion
4%7%12%17%20%17%12%7%4%
Stanine123456789
Standardized
score
below −1.75−1.75 to −1.25−1.25 to −0.75−0.75 to −0.25−0.25 to +0.25+0.25 to +0.75+0.75 to +1.25+1.25 to +1.75above +1.75
Wechsler
scale score
below 7474 to 8181 to 8989 to 9696 to 104104 to 111111 to 119119 to 126above 126

The underlying basis for obtaining stanines is that a normal distribution is divided into nine intervals, each of which has a width of 0.5 standard deviations excluding the first and last, which are just the remainder (the tails of the distribution). The mean lies at the centre of the fifth interval.

Use today

Today stanines are mostly used in educational assessment.[ citation needed ]

See also

Notes

  1. Thorndike, R. L. (1982). Applied Psychometrics. Boston, MA: Houghton Mifflin
  2. "Grade Comparison Guide". Archived from the original on 2006-12-12. Retrieved 2007-01-04.
  3. "Understanding Stanines", nzcersupport.org.nz
  4. "GL Assessment"

Related Research Articles

<span class="mw-page-title-main">Psychometrics</span> Theory and technique of psychological measurement

Psychometrics is a field of study within psychology concerned with the theory and technique of measurement. Psychometrics generally refers to specialized fields within psychology and education devoted to testing, measurement, assessment, and related activities. Psychometrics is concerned with the objective measurement of latent constructs that cannot be directly observed. Examples of latent constructs include intelligence, introversion, mental disorders, and educational achievement. The levels of individuals on nonobservable latent variables are inferred through mathematical modeling based on what is observed from individuals' responses to items on tests and scales.

<span class="mw-page-title-main">Quantile</span> Statistical method of dividing data into equal-sized intervals for analysis

In statistics and probability, quantiles are cut points dividing the range of a probability distribution into continuous intervals with equal probabilities, or dividing the observations in a sample in the same way. There is one fewer quantile than the number of groups created. Common quantiles have special names, such as quartiles, deciles, and percentiles. The groups created are termed halves, thirds, quarters, etc., though sometimes the terms for the quantile are used for the groups created, rather than for the cut points.

<span class="mw-page-title-main">Psychological testing</span> Administration of psychological tests

Psychological testing is the administration of psychological tests. Psychological tests are administered by trained evaluators. A person's responses are evaluated according to carefully prescribed guidelines. Scores are thought to reflect individual or group differences in the construct the test purports to measure. The science behind psychological testing is psychometrics.

<span class="mw-page-title-main">Standard score</span> How many standard deviations apart from the mean an observed datum is

In statistics, the standard score is the number of standard deviations by which the value of a raw score is above or below the mean value of what is being observed or measured. Raw scores above the mean have positive standard scores, while those below the mean have negative standard scores.

In medicine and health-related fields, a reference range or reference interval is the range or the interval of values that is deemed normal for a physiological measurement in healthy persons. It is a basis for comparison for a physician or other health professional to interpret a set of test results for a particular patient. Some important reference ranges in medicine are reference ranges for blood tests and reference ranges for urine tests.

The Stanford–Binet Intelligence Scales is an individually administered intelligence test that was revised from the original Binet–Simon Scale by Alfred Binet and Théodore Simon. The Stanford–Binet Intelligence Scale is now in its fifth edition (SB5), which was released in 2003. It is a cognitive-ability and intelligence test that is used to diagnose developmental or intellectual deficiencies in young children. The test measures five weighted factors and consists of both verbal and nonverbal subtests. The five factors being tested are knowledge, quantitative reasoning, visual-spatial processing, working memory, and fluid reasoning.

In statistics and applications of statistics, normalization can have a range of meanings. In the simplest cases, normalization of ratings means adjusting values measured on different scales to a notionally common scale, often prior to averaging. In more complicated cases, normalization may refer to more sophisticated adjustments where the intention is to bring the entire probability distributions of adjusted values into alignment. In the case of normalization of scores in educational assessment, there may be an intention to align distributions to a normal distribution. A different approach to normalization of probability distributions is quantile normalization, where the quantiles of the different measures are brought into alignment.

A norm-referenced test (NRT) is a type of test, assessment, or evaluation which yields an estimate of the position of the tested individual in a predefined population, with respect to the trait being measured. Assigning scores on such tests may be described as relative grading, marking on a curve (BrE) or grading on a curve. It is a method of assigning grades to the students in a class in such a way as to obtain or approach a pre-specified distribution of these grades having a specific mean and derivation properties, such as a normal distribution. The term "curve" refers to the bell curve, the graphical representation of the probability density of the normal distribution, but this method can be used to achieve any desired distribution of the grades – for example, a uniform distribution. The estimate is derived from the analysis of test scores and possibly other relevant data from a sample drawn from the population. That is, this type of test identifies whether the test taker performed better or worse than other test takers, not whether the test taker knows either more or less material than is necessary for a given purpose. The term normative assessment is used when the reference population are the peers of the test taker.

The Culture Fair Intelligence Test (CFIT) was created by Raymond Cattell in 1949 as an attempt to measure cognitive abilities devoid of sociocultural and environmental influences. Scholars have subsequently concluded that the attempt to construct measures of cognitive abilities devoid of the influences of experiential and cultural conditioning is a challenging one. Cattell proposed that general intelligence (g) comprises both fluid intelligence (Gf) and crystallized intelligence (Gc). Whereas Gf is biologically and constitutionally based, Gc is the actual level of a person's cognitive functioning, based on the augmentation of Gf through sociocultural and experiential learning.

In educational statistics, a normal curve equivalent (NCE), developed for the United States Department of Education by the RMC Research Corporation, is a way of normalizing scores received on a test into a 0-100 scale similar to a percentile rank, but preserving the valuable equal-interval properties of a z-score.

Germany uses a 5- or 6-point grading scale (GPA) to evaluate academic performance for the youngest to the oldest students. Grades vary from 1 to 5. In the final classes of German Gymnasium schools that prepare for university studies, a point system is used with 15 points being the best grade and 0 points the worst. The percentage causing the grade can vary from teacher to teacher.

Test equating traditionally refers to the statistical process of determining comparable scores on different forms of an exam. It can be accomplished using either classical test theory or item response theory.

The Otis–Lennon School Ability Test (OLSAT), published by the successor of Harcourt Assessment—Pearson Education, Inc., a subsidiary of Pearson PLC—is, according to the publisher, a test of abstract thinking and reasoning ability of children pre-K to 18. The Otis-Lennon is group-administered, multiple choice, taken with pencil and paper, measures verbal, quantitative, and spatial reasoning ability. The test yields verbal and nonverbal scores, from which a total score is derived, called a School Ability Index (SAI). The SAI is a normalized standard score with a mean of 100 and a standard deviation of 16. With the exception of pre-K, the test is administered in groups.

<span class="mw-page-title-main">IQ classification</span> Categorisation of people based on IQ

IQ classification is the practice by Intelligence quotient (IQ) test publishers of labeling IQ score ranges with category names such as "superior" or "average".

The results for some scales of some psychometric instruments are returned as sten scores, sten being an abbreviation for 'Standard Ten' and thus closely related to stanine scores.

The Cognitive Abilities Test(CogAT) is a group-administered K–12 assessment published by Riverside Insights and intended to estimate students' learned reasoning and problem solving abilities through a battery of verbal, quantitative, and nonverbal test items. The test purports to assess students' acquired reasoning abilities while also predicting achievement scores when administered with the co-normed Iowa Tests. The test was originally published in 1954 as the Lorge-Thorndike Intelligence Test, after the psychologists who authored the first version of it, Irving Lorge and Robert L. Thorndike. The CogAT is one of several tests used in the United States to help teachers or other school staff make student placement decisions for gifted education programs, and is accepted for admission to Intertel, a high IQ society for those who score at or above the 99th percentile on a test of intelligence.

The Naglieri Nonverbal Ability Test (NNAT) is a nonverbal measure of general ability designed by Jack A. Naglieri and published by Pearson Education. The Naglieri Nonverbal Ability Test - Individual Form was first published in 1998. Two versions were published in 2007 and 2008, respectively. This includes the group administered Naglieri Nonverbal Ability Test - Second Edition and the Naglieri Nonverbal Ability Test - Online version. The most current version is NNAT3. Like all nonverbal ability tests, the NNAT is intended to assess cognitive ability independently of linguistic and cultural background.

A test score is a piece of information, usually a number, that conveys the performance of an examinee on a test. One formal definition is that it is "a summary of the evidence contained in an examinee's responses to the items of a test that are related to the construct or constructs being measured."

In statistics, robust measures of scale are methods that quantify the statistical dispersion in a sample of numerical data while resisting outliers. The most common such robust statistics are the interquartile range (IQR) and the median absolute deviation (MAD). These are contrasted with conventional or non-robust measures of scale, such as sample variance or standard deviation, which are greatly influenced by outliers.

The Child Behavior Checklist (CBCL) is a widely used caregiver report form identifying problem behavior in children. It is widely used in both research and clinical practice with youths. It has been translated into more than 90 languages, and normative data are available integrating information from multiple societies. Because a core set of the items have been included in every version of the CBCL since the 1980s, it provides a meter stick for measuring whether amounts of behavior problems have changed over time or across societies. This is a helpful complement to other approaches for looking at rates of mental-health issues, as the definitions of disorders have changed repeatedly over the same time frame.

References

Volume Six MEN AND PLANES THE ARMY AIR FORCES In World War II PREPARED UNDER THE EDITORSHIP OF WESLEY FRANK CRAVEN JAMES LEA GATE Princeton University University of Chicago

Here is a link to the document: http://www.afhso.af.mil/shared/media/document/AFD-101105-019.pdf Archived 2016-11-23 at the Wayback Machine