Anchor test

Last updated June 16, 2020

In psychometrics, an anchor test is a common set of test items administered in combination with two or more alternative forms of the test with the aim of establishing the equivalence of the test scores on the alternative forms. The purpose of the anchor test is to provide a baseline for an equating analysis between different forms of a test.^[1]

Anchor test is one type of psychological assessment tool to measure an individual's knowledge or cognitive ability by testing the same areas in different ways. In psychometrics, to develop assessment tools that are reliable for testing certain skills and abilities are what most Psychometricists are interested in. Anchor tests are not intended to test the subject's ability to take tests, interpret questions, or understand a concept that is unrelated to the test questions.^[2] Instead, it eliminates the incongruency between what the test is designed to assess and what it actually assesses. Subjects will be tested on the same knowledge and skills in multiple ways in an anchor test. Compared with traditional tests in both education and psychology, anchor tests are intended to find out what an individual is able to do rather than what an individual is unable to do. A study examined that higher anchor test to total test correlation leads to better equating . It thus implies that an anchor test with items of medium difficulty (miditest) may lead to better equating than an Anchor test of lesser difficulty (minitest).^[3]

Related Research Articles

Psychometrics is a field of study concerned with the theory and technique of psychological measurement. As defined by the US National Council on Measurement in Education (NCME), psychometrics refers to psychological measurement. Generally, it refers to the field in psychology and education that is devoted to testing, measurement, assessment, and related activities.

Psychological testing is the administration of psychological tests, which are designed to be "an objective and standardized measure of a sample of behavior". The term sample of behavior refers to an individual's performance on tasks that have usually been prescribed beforehand. The samples of behavior that make up a paper-and-pencil test, the most common type of test, are a series of items. Performance on these items produce a test score. A score on a well-constructed test is believed to reflect a psychological construct such as achievement in a school subject, cognitive ability, aptitude, emotional functioning, personality, etc. Differences in test scores are thought to reflect individual differences in the construct the test is supposed to measure. The science behind psychological testing is psychometrics.

A standardized test is a test that is administered and scored in a consistent, or "standard", manner. Standardized tests are designed in such a way that the questions, conditions for administering, scoring procedures, and interpretations are consistent and are administered and scored in a predetermined, standard manner.

Reliability in statistics and psychometrics is the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions. "It is the characteristic of a set of test scores that relates to the amount of random error from the measurement process that might be embedded in the scores. Scores that are highly reliable are precise, reproducible, and consistent from one testing occasion to another. That is, if the testing process were repeated with a group of test takers, essentially the same results would be obtained. Various kinds of reliability coefficients, with values ranging between 0.00 and 1.00, are usually used to indicate the amount of error in the scores." For example, measurements of people's height and weight are often extremely reliable.

In psychometrics, item response theory (IRT) is a paradigm for the design, analysis, and scoring of tests, questionnaires, and similar instruments measuring abilities, attitudes, or other variables. It is a theory of testing based on the relationship between individuals' performances on a test item and the test takers' levels of performance on an overall measure of the ability that item was designed to measure. Several different statistical models are used to represent both item and test taker characteristics. Unlike simpler alternatives for creating scales and evaluating questionnaire responses, it does not assume that each item is equally difficult. This distinguishes IRT from, for instance, Likert scaling, in which "All items are assumed to be replications of each other or in other words items are considered to be parallel instruments" (p. 197). By contrast, item response theory treats the difficulty of each item as information to be incorporated in scaling items.

Educational assessment or educational evaluation is the systematic process of documenting and using empirical data on the knowledge, skill, attitudes, and beliefs to refine programs and improve student learning. Assessment data can be obtained from directly examining student work to assess the achievement of learning outcomes or can be based on data from which one can make inferences about learning. Assessment is often used interchangeably with test, but not limited to tests. Assessment can focus on the individual learner, the learning community, a course, an academic program, the institution, or the educational system as a whole. The word 'assessment' came into use in an educational context after the Second World War.

In psychology, a projective test is a personality test designed to let a person respond to ambiguous stimuli, presumably revealing hidden emotions and internal conflicts projected by the person into the test. This is sometimes contrasted with a so-called "objective test" / "self-report test", which adopt a "structured" approach as responses are analyzed according to a presumed universal standard, and are limited to the content of the test. The responses to projective tests are content analyzed for meaning rather than being based on presuppositions about meaning, as is the case with objective tests. Projective tests have their origins in psychoanalysis, which argues that humans have conscious and unconscious attitudes and motivations that are beyond or hidden from conscious awareness.

In the field of psychology, the Dunning–Kruger effect is a cognitive bias in which people with low ability at a task overestimate their ability. It is related to the cognitive bias of illusory superiority and comes from the inability of people to recognize their lack of ability. Without the self-awareness of metacognition, people cannot objectively evaluate their competence or incompetence.

A criterion-referenced test is a style of test which uses test scores to generate a statement about the behavior that can be expected of a person with that score. Most tests and quizzes that are written by school teachers can be considered criterion-referenced tests. In this case, the objective is simply to see whether the student has learned the material. Criterion-referenced assessment can be contrasted with norm-referenced assessment and ipsative assessment.

Authentic assessment is the measurement of "intellectual accomplishments that are worthwhile, significant, and meaningful," as contrasted to multiple choice standardized tests. Authentic assessment can be devised by the teacher, or in collaboration with the student by engaging student voice. When applying authentic assessment to student learning and achievement, a teacher applies criteria related to “construction of knowledge, disciplined inquiry, and the value of achievement beyond the school.”

Career assessments are tools that are designed to help individuals understand how a variety of personal attributes, impact their potential success and satisfaction with different career options and work environments. Career assessments have played a critical role in career development and the economy in the last century. Assessments of some or all of these attributes are often used by individuals or organizations, such as university career service centers, career counselors, outplacement companies, corporate human resources staff, executive coaches, vocational rehabilitation counselors, and guidance counselors to help individuals make more informed career decisions.

Test equating traditionally refers to the statistical process of determining comparable scores on different forms of an exam. It can be accomplished using either classical test theory or item response theory.

Situational stress tests (SStTs) or Inventories (SSIs) are a type of psychological test which present the test-taker with realistic, hypothetical scenarios and ask the individual to identify the most appropriate response or to rank the responses in the order they feel is most effective. SJTs can be presented to test-takers through a variety of modalities, such as booklets, films, or audio recordings. SJTs represent a distinct psychometric approach from the common knowledge-based multiple choice item. They are often used in industrial-organizational psychology applications such as personnel selection. Situational judgement tests tend to determine behavioral tendencies, assessing how an individual will behave in a certain situation, and knowledge instruction, which evaluates the effectiveness of possible responses. Situational judgement tests could also reinforce the status quo with an organization.

An assessment day is usually used in the context of recruitment. On this day, a group of applicants who have applied for a particular role are invited to an assessment centre, where a combination of selection techniques are used by the employers to measure the suitability of an individual for the job role. These selection technique usually include exercises such as presentation, group exercise, one to one Interview, role play, psychometric test etc. Most large organisations like banks, audit and IT firms use assessment days to recruit the fresh talent in their graduate programmes. With an increase of popularity of assessment days, several training institutes have been formed that prepare candidates for assessment days, for example, Green Turn is a famous institute that prepares candidates for assessment days of big 4 accountancy firms.

Differentiated instruction and assessment, also known as differentiated learning or, in education, simply, differentiation, is a framework or philosophy for effective teaching that involves providing all students within their diverse classroom community of learners a range of different avenues for understanding new information in terms of: acquiring content; processing, constructing, or making sense of ideas; and developing teaching materials and assessment measures so that all students within a classroom can learn effectively, regardless of differences in ability. Students vary in culture, socioeconomic status, language, gender, motivation, ability/disability, personal interests and more, and teachers must be aware of these varieties as they plan curricula. By considering varied learning needs, teachers can develop personalized instruction so that all children in the classroom can learn effectively. Differentiated classrooms have also been described as ones that respond to student variety in readiness levels, interests and learning profiles. It is a classroom that includes all students and can be successful. To do this, a teacher sets different expectations for task completion for students based upon their individual needs.

Test (assessment) Procedure for measuring a subjects knowledge, skill, aptitude, physical fitness, or other characteristics

A test or examination is an assessment intended to measure a test-taker's knowledge, skill, aptitude, physical fitness, or classification in many other topics. A test may be administered verbally, on paper, on a computer, or in a predetermined area that requires a test taker to demonstrate or perform a set of skills. Tests vary in style, rigor and requirements. For example, in a closed book test, a test taker is usually required to rely upon memory to respond to specific items whereas in an open book test, a test taker may use one or more supplementary tools such as a reference book or calculator when responding. A test may be administered formally or informally. An example of an informal test is a reading test administered by a parent to a child. A formal test might be a final examination administered by a teacher in a classroom or an I.Q. test administered by a psychologist in a clinic. Formal testing often results in a grade or a test score. A test score may be interpreted with regards to a norm or criterion, or occasionally both. The norm may be established independently, or by statistical analysis of a large number of participants. An exam is meant to test a persons knowledge or willingness to give time to manipulate that subject.

Individual psychological assessment (IPA) is a tool used by organizations to make decisions on employment. IPA allows employers to evaluate and maintain potential candidates for hiring, promotion, and development by using a series of job analysis instruments such as position analysis questionnaires (PAQ), occupational analysis inventory (OAI), and functional job analysis (FJA). These instruments allow the assessor to develop valid measures of intelligence, personality tests, and a range of other factors as means to determine selection and promotion decisions. Personality and cognitive ability are good predictors of performance. Emotional Intelligence helps individuals navigate through challenging organizational and interpersonal encounters. Since individual differences have a long history in explaining human behavior and the different ways in which individuals respond to similar events and circumstances, these factors allow the organization to determine if an applicant has the competence to effectively and successfully do the work that the job requires. These assessments are administered throughout organizations in different forms, but they share one common goal in the selection process, and that is the right candidate for the job.

The Attribution Questionnaire (AQ) is a 27-item self-report assessment tool designed to measure public stigma towards people with mental illnesses. It assesses emotional reaction and discriminatory responses based on answers to a hypothetical vignette about a man with schizophrenia named Harry. There are several different versions of the vignette that test multiple forms of attribution. Responses assessing stigma towards Harry are in the form of 27 items rated on a Likert scale ranging from 1 to 9. There are 9 subscales within the AQ that breakdown the responses one could have towards a person with mental illness into different categories. The AQ was created in 2003 by Dr. Patrick Corrigan and colleagues and has since been revised into smaller tests because of the complexity and hypothetical that did not capture children and adolescent's stigmas well. The later scales are the Attribution Questionnaire-9 (AQ-9), the revised Attribution Questionnaire (r-AQ), and the children's Attribution Questionnaire (AQ-8-C).

Washback effect refers to the impact of testing on curriculum design, teaching practices, and learning behaviors. The influences of testing can be found in the choices of learners and teachers: teachers may teach directly for specific test preparation, or learners might focus on specific aspects of language learning found in assessments. Washback effect in testing is typically seen as either negative, or positive. Washback may be considered harmful to more fluid approaches in language education where definitions of language ability may be limited; however, it may be considered beneficial when good teaching practices result. Washback can also be positive or negative in that it either maintains or hinders the accomplishment of educational goals. In positive washback, teaching the curriculum becomes the same as teaching to a specific test. Negative washback occurs in situations where there may be a mismatch between the stated goals of instruction and the focus of assessment; it may lead to the abandonment of instructional goals in favor of test preparation.

References

↑ Kolen, M.J., & Brennan, R.L. (1995). Test Equating. New York: Spring.
↑ Fox, C.B. "What is an anchor test". Wisegeek. 2003-2015 Conjecture Corporation. Retrieved 31 March 2015.
↑ "A Note on the Choice of an Anchor Test in Equating" (PDF). ets.org. Educational Testing Service. Retrieved 16 March 2015.

This page is based on this Wikipedia article
Text is available under the CC BY-SA 4.0 license; additional terms may apply.
Images, videos and audio are available under their respective licenses.

[KolenBrennan-1] Kolen, M.J., & Brennan, R.L. (1995). Test Equating. New York: Spring.

[2] Fox, C.B. "What is an anchor test". Wisegeek. 2003-2015 Conjecture Corporation. Retrieved 31 March 2015.

[A_Note_on_the_Choice_of_an_Anchor_Test_in_Equating-3] "A Note on the Choice of an Anchor Test in Equating" (PDF). ets.org. Educational Testing Service. Retrieved 16 March 2015.