Computer-adaptive sequential testing

Last updated

Computer-adaptive sequential testing (CAST) is another term for multistage testing. A CAST test is a type of computer-adaptive test or computerized classification test that uses pre-defined groups of items called testlets rather than operating at the level of individual items. [1] CAST is a term introduced by psychometricians working for the National Board of Medical Examiners. [2] In CAST, the testlets are referred to as panels.

Multistage testing is an algorithm-based approach to administering tests. It is very similar to computer-adaptive testing in that items are interactively selected for each examinee by the algorithm, but rather than selecting individual items, groups of items are selected, building the test in stages. These groups are called testlets or panels.

A computerized classification test (CCT) refers to, as its name would suggest, a test that is administered by computer for the purpose of classifying examinees. The most common CCT is a mastery test where the test classifies examinees as "Pass" or "Fail," but the term also includes tests that classify examinees into more than two categories. While the term may generally be considered to refer to all computer-administered tests for classification, it is usually used to refer to tests that are interactively administered or of variable-length, similar to computerized adaptive testing (CAT). Like CAT, variable-length CCTs can accomplish the goal of the test with a fraction of the number of items used in a conventional fixed-form test.

Psychometrics is a field of study concerned with the theory and technique of psychological measurement. As defined by the US National Council on Measurement in Education (NCME), psychometrics refers to psychological measurement. Generally, it refers to the field in psychology and education that is devoted to testing, measurement, assessment, and related activities.

Related Research Articles

Static program analysis is the analysis of computer software that is performed without actually executing programs, in contrast with dynamic analysis, which is analysis performed on programs while they are executing. In most cases the analysis is performed on some version of the source code, and in the other cases, some form of the object code.

Short-term memory is the capacity for holding, but not manipulating, a small amount of information in mind in an active, readily available state for a short period of time. For example, short-term memory can be used to remember a phone number that has just been recited. The duration of short-term memory is believed to be in the order of seconds. The most commonly cited capacity is The Magical Number Seven, Plus or Minus Two, despite the facts that Miller himself stated that the figure was intended as "little more than a joke" and that Cowan (2001) provided evidence that a more realistic figure is 4±1 units. In contrast, long-term memory can hold the information indefinitely.

In computer science, consistency models are used in distributed systems like distributed shared memory systems or distributed data stores. The system is said to support a given model if operations on memory follow specific rules. The data consistency model specifies a contract between programmer and system, wherein the system guarantees that if the programmer follows the rules, memory will be consistent and the results of reading, writing, or updating memory will be predictable. This is different from coherence, which occurs in systems that are cached or cache-less, and is consistency of data with respect to all processors. Coherence deals with maintaining a global order in which writes to a single location or single variable are seen by all processors. Consistency deals with the ordering of operations to multiple locations with respect to all processors.

The Graduate Record Examinations (GRE) is a standardized test that is an admissions requirement for many graduate schools in the United States and Canada. The GRE is owned and administered by Educational Testing Service (ETS). The test was established in 1936 by the Carnegie Foundation for the Advancement of Teaching.

A playlist is a list of video or audio files that can be played back on a media player either sequentially or in a shuffled order. In its most general form, an audio playlist is simply a list of songs, but sometimes a loop. The term has several specialized meanings in the realms of television broadcasting, radio broadcasting and personal computers.

Multiple choice or objective response is a form of an objective assessment in which respondents are asked to select only correct answers from the choices offered as a list. The multiple choice format is most frequently used in educational testing, in market research, and in elections, when a person chooses between multiple candidates, parties, or policies.

Agile software development comprises various approaches to software development under which requirements and solutions evolve through the collaborative effort of self-organizing and cross-functional teams and their customer(s)/end user(s). It advocates adaptive planning, evolutionary development, early delivery, and continual improvement, and it encourages rapid and flexible response to change.

Serial-position effect

Serial-position effect is the tendency of a person to recall the first and last items in a series best, and the middle items worst. The term was coined by Hermann Ebbinghaus through studies he performed on himself, and refers to the finding that recall accuracy varies as a function of an item's position within a study list. When asked to recall a list of items in any order, people tend to begin recall with the end of the list, recalling those items best. Among earlier list items, the first few items are recalled more frequently than the middle items.

The actor model in computer science is a mathematical model of concurrent computation that treats "actors" as the universal primitives of concurrent computation. In response to a message that it receives, an actor can: make local decisions, create more actors, send more messages, and determine how to respond to the next message received. Actors may modify their own private state, but can only affect each other indirectly through messaging.

Computerized adaptive testing (CAT) is a form of computer-based test that adapts to the examinee's ability level. For this reason, it has also been called tailored testing. In other words, it is a form of computer-administered test in which the next item or set of items selected to be administered depends on the correctness of the test taker's responses to the most recent items administered.

In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data.

Verification and validation are independent procedures that are used together for checking that a product, service, or system meets requirements and specifications and that it fulfills its intended purpose. These are critical components of a quality management system such as ISO 9000. The words "verification" and "validation" are sometimes preceded with "independent", indicating that the verification and validation is to be performed by a disinterested third party. "Independent verification and validation" can be abbreviated as "IV&V".

Linear-on-the-fly testing, often referred to as LOFT, is a method of delivering educational or professional examinations. Competing methods include traditional linear fixed-form delivery and computerized adaptive testing. LOFT is a compromise between the two, in an effort to maintain the equivalence of the set of items that each examinee sees, which is found in fixed-form delivery, while attempting to reduce item exposure and enhance test security.

Group testing A procedure that breaks up the task of identifying certain objects into tests on groups of items.

In statistics and combinatorial mathematics, group testing is any procedure that breaks up the task of identifying certain objects into tests on groups of items, rather than on individual ones. First studied by Robert Dorfman in 1943, group testing is a relatively new field of applied mathematics that can be applied to a wide range of practical applications and is an active area of research today.

Howard Wainer American statistician

Howard Wainer is an American statistician, past principal research scientist at the Educational Testing Service, adjunct professor of statistics at the Wharton School of the University of Pennsylvania, and author, known for his contributions in the fields of statistics, psychometrics, and statistical graphics.

Test (assessment) Procedure for measuring a subjects knowledge, skill, aptitude, physical fitness, or other characteristics

A test or examination is an assessment intended to measure a test-taker's knowledge, skill, aptitude, physical fitness, or classification in many other topics. A test may be administered verbally, on paper, on a computer, or in a predetermined area that requires a test taker to demonstrate or perform a set of skills. Tests vary in style, rigor and requirements. For example, in a closed book test, a test taker is usually required to rely upon memory to respond to specific items whereas in an open book test, a test taker may use one or more supplementary tools such as a reference book or calculator when responding. A test may be administered formally or informally. An example of an informal test would be a reading test administered by a parent to a child. A formal test might be a final examination administered by a teacher in a classroom or an I.Q. test administered by a psychologist in a clinic. Formal testing often results in a grade or a test score. A test score may be interpreted with regards to a norm or criterion, or occasionally both. The norm may be established independently, or by statistical analysis of a large number of participants. An exam is meant to test a persons knowledge or willingness to give time to manipulate that subject.

Klaus Kubinger Austrian psychologist

Klaus D. Kubinger, is a psychologist as well as a statistician, professor for psychological assessment at the University of Vienna, Faculty of Psychology. His main research work focuses on fundamental research of assessment processes and on application and advancement of Item response theory models. He is also known as a textbook author of psychological assessment on the one hand and on statistics on the other hand.

Eric Thomas Bradlow is K.P. Chao Professor, Professor of Marketing, Statistics, Education and Economics, Chairperson Wharton Marketing Department, and Vice-Dean of Analytics at the Wharton School of the University of Pennsylvania. He is known for his work on marketing research methods, missing data problems, and psychometrics. He is a fellow of the American Statistical Association and a fellow the American Education Research Association. Professor Bradlow is also co-founder of GBH Insights, a leading data-focused marketing strategy and insights firm that caters to Fortune 500 companies.

References

  1. Luecht, R.M. (2005). Some useful cost-benefit criteria for evaluating computer-based test delivery models and systems. Journal of Applied Testing Technology, 7(2). "Archived copy" (PDF). Archived from the original (PDF) on 2006-09-27. Retrieved 2006-12-01.CS1 maint: archived copy as title (link)
  2. Luecht, R. M. & Nungester, R. J. (1998). Some practical examples of computer-adaptive sequential testing. Journal of Educational Measurement, 35, 229–249.