High-stakes testing

Last updated December 24, 2024

A high-stakes test is a test with important consequences for the test taker.^[1] Passing has important benefits, such as a high school diploma, a scholarship, or a license to practice a profession. Failing has important disadvantages, such as being forced to take remedial classes until the test can be passed, not being allowed to drive a car, or difficulty finding employment.

Definitions

In common usage, a high-stakes test is any test that has major consequences or is the basis of a major decision.^[1]^[3]^[4]

Under a more precise definition, a high-stakes test is any test that:

is a single, defined assessment,
has a clear line drawn between those who pass and those who fail, and
has direct consequences for passing or failing (something "at stake").^[5]

For example, exit examinations for high school graduation are often high-stakes tests: there is a single, defined test (the student must pass this test; no other test can be substituted); some scores are high enough to pass and others are not; and failing has the direct consequence of preventing graduation. Similarly, driving tests are often high-stakes, as they also meet the same three criteria.

High-stakes testing is not synonymous with high-pressure testing. An American high school student might feel pressure to perform well on the SAT-I college aptitude exam. However, SAT scores do not directly determine admission to any college or university, and there is no clear line drawn between those who pass and those who fail, so it is not formally considered a high-stakes test.^[6]^[7] On the other hand, because the SAT-I scores are given significant weight in the admissions process at some schools, many people believe that it has consequences for doing well or poorly, and it could therefore be considered a high-stakes test under the simpler, common definition.^[8]^[9]

A high-stakes test can be contrasted with a medium-stakes test or a low-stakes test.^[7] A medium-stakes test might provide access to a desirable but less necessary benefit, such as an award, or it is only one component of a decision-making process, such as an admissions program that looks at the test results plus other factors. A low-stakes test has no significant consequences to the test taker.

The stakes

High stakes are not a characteristic of the test itself, but rather of the consequences placed on the outcome. For example, no matter what type of test is used—written essays, computer-based multiple choice, oral examination, performance test, or anything else—a medical licensing test must be passed to practice medicine.

The perception of the stakes may vary. For example, college students who wish to skip an introductory-level course are often given exams to see whether they have already mastered the material and can be passed to the next level. Passing the exam can reduce tuition costs and time spent at university. A student who is anxious to have these benefits may consider the test to be a high-stakes exam. Another student, who places no importance on the outcome, so long as he is placed in a class that is appropriate to his skill level, may consider the same exam to be a low-stakes test.^[5]

The phrase "high stakes" is derived directly from a gambling term. In gambling, a stake is the quantity of money or other goods that is risked on the outcome of some specific event. A high-stakes game is one in which, in the player's personal opinion, a large quantity of money is being risked. The term is meant to imply that implementing such a system introduces uncertainty and potential losses for test takers,^{[ citation needed ]} who must pass the exam to "win," instead of being able to obtain the goal through other means.^{[ citation needed ]}

Examples

Examples of high-stakes tests and their "stakes" include:

Driver's license tests and the legal ability to drive
College entrance examinations in some countries, such as Brazil's National High School Exam, and admission to a high-quality university
Visa interview/Citizenship test for migration and naturalization purposes
Many job interviews or drug tests and being hired
High school exit examinations and high-school diplomas
No Child Left Behind tests and school funding and ratings
Ph.D. oral exams and receiving the doctorate
Professional licensing and certification examinations (such as the bar exams, FAA written tests, and medical exams) and the license or certification being sought
Standardised test of language proficiency in work, school-placement and visa-application contexts
NCLEX-RN or NCLEX-PN exam for nursing students

Stakeholders

A high-stakes system may be intended to benefit people other than the test-taker. For professional certification and licensure examinations, the purpose of the test is to protect the general public from incompetent practitioners. The individual stakes of the medical student and the medical school are, hopefully, balanced against the social stakes of possibly allowing an incompetent doctor to practice medicine.^[10]

A test may be "high-stakes" based on consequences for others beyond the individual test-taker.^[4] For example, an individual medical student who fails a licensing exam cannot practice his or her profession. However, if enough students at the same school fail the exam, the school's reputation and accreditation may be jeopardized. Similarly, testing under the U.S.'s No Child Left Behind Act had no direct negative consequences for failing students,^[11] but potentially serious consequences for their schools, including loss of accreditation, funding, teacher pay, teacher employment, or changes to the school's management.^[12] The stakes were therefore high for the school, but low for the individual test-takers.

Assessments used

Any form of assessment can be used as a high-stakes test. Many times, an inexpensive multiple-choice test is chosen for convenience. A high-stakes assessment may also involve answering open-ended questions or a practical, hands-on section. For example, a typical high-stakes licensing exam for a medical nurse determines whether the nurse can insert an I.V. line by watching the nurse actually do this task. These assessments are called authentic assessments or performance tests.^[5]

Some high-stakes tests may be standardized tests (in which all examinees take the same test under reasonably equal conditions), with the expectation that standardization affords all examinees a fair and equal opportunity to pass.^[5] Some high-stakes tests are non-standardized, such as a theater audition.

As with other tests, high-stakes tests may be criterion-referenced or norm-referenced.^[5] For example, a written driver's license examination typically is criterion-referenced, with an unlimited number of potential drivers able to pass if they correctly answer a certain percentage of questions. On the other hand, essay portions of some bar exams are often norm-referenced, with the worst essays failed and the best essays passed, without regard for the overall quality of the essays.

The "clear line" between passing and failing on an exam may be achieved through use of a cut score: for example, test takers correctly answering 75% or more of the questions pass the test; test takers correctly answering 74% or fewer fail, or don't "make the cut". In large-scale high-stakes testing, rigorous and expensive standard-setting studies may be employed to determine the ideal cut score or to keep the test results consistent between groups taking the test at different times.

Criticisms

High-stakes tests, despite their extensive usage for determination of academic and non-academic proficiency, are subject to criticism for various reasons. Example concerns include the following:

The test does not correctly measure the individual's knowledge or skills. For example, a test might purport to be a general reading-skills test, but it might actually determine whether or not the examinee has read a specific book. In the context of computer-based high-stakes tests, low-income test takers and others without ready access to computers may be disadvantaged,^[13] if the test is supposed to measure reading skills but in practice measures the test takers' typing skills or their familiarity with answering questions on a computer.
The test may not measure what the critic wants measured. For example, a test might accurately measure whether a law student has acquired fundamental knowledge of the legal system, but the critic might want these would-be lawyers to be tested on legal ethics instead of legal knowledge.
High-stakes testing may encourage teachers to omit material that is not tested. "Teaching to the test" can result in a narrow curriculum and lower skills. For example, if a driving exam does not test parallel parking skills, then driving instructors might stop teaching that skill to a driving student, in favor of focusing instruction time on the material that will be tested, such as determining which vehicle has the right of way at a four-way stop. The result is that the student will be able to pass the test, but may be unable to park a car safely in some places. According to Campbell's law, the higher the stakes are (for the test taker or for the school), the more likely this is to happen.
Testing causes stress for some people. Critics suggest that since some people perform poorly under the pressure associated with tests, any test is likely to be less representative of their actual standard of achievement than a non-test alternative.^[14] This is called test anxiety or performance anxiety.
High-stakes tests are often given as a single long exam. Some critics prefer continuous assessment instead of one larger test. For example, the American Psychological Association (APA) opposes using a one-time high school exit examination as the single determinant of whether a student should graduate from high school, saying, "Any decision about a student's continued education, such as retention, tracking, or graduation, should not be based on the results of a single test, but should include other relevant and valid information."^[15] Since the stakes are related to consequences, not method, however, short tests can also be high-stakes.
High-stakes testing creates more incentive for cheating.^[16] Because cheating on a single critical exam may be easier than either learning the required material or earning credit through attendance, diligence, or many smaller tests, more examinees that do not actually have the necessary knowledge or skills, but who are effective cheaters, may pass. Also, some people who would otherwise pass the test but are not confident enough of themselves might decide to additionally secure the outcome by cheating, get caught and often face even worse consequences than just failing. Additionally, if the test results are used to determine the teachers' pay or continued employment, or to evaluate the school, then school personnel may fraudulently alter student test papers to artificially inflate performance.^[16]
Sometimes a high-stakes test is tied to a controversial reward. For example, some people may want a high-school diploma to represent the verified acquisition of specific skills or knowledge, and therefore use a high-stakes assessment to deny a diploma to anyone who cannot perform the necessary skills.^[17] Others may want a high school diploma to represent primarily a certificate of attendance, so that a person who faithfully attended class but cannot read or write will still get the social benefits of graduation. This use of tests—to deny a high school diploma, and thereby access to most jobs and higher education for a lifetime—is controversial even when the test itself accurately identifies students that do not have the necessary skills. Criticism is usually framed as over-reliance on a single measurement^[18] or in terms of social justice, if the absence of skill is not entirely the test taker's fault, as in the case of a student who cannot read because of unqualified teachers, or a person with advanced dementia that can no longer pass a driving exam due to loss of cognitive function.^[3]
Tests can penalize test takers that do not have the necessary skills through no fault of their own. An absence of skill may not be the test taker's fault, but high-stakes test measure only skill proficiency, regardless of whether the test takers had an equal opportunity to learn the material.^[3]^[19]^[20] Additionally, wealthy test takers may use private tutoring or test preparation programs to improve their scores. Some affluent parents pay thousands of dollars to prepare their children for university admissions tests.^[21] Critics see this as being unfair to families who cannot afford to pay for additional educational services.^[22]
High-stakes tests reveal that some examinees do not know the required material, or do not have the necessary skills. While failing these people may have many public benefits, the consequences of repeated failure can be very high for the individual. For example, a person who fails a practical driving exam will not be able to drive a car legally, which means they cannot drive to work and may lose their job if alternative transportation options are not available. The person may suffer social embarrassment when his acquaintances discover that his lack of skill resulted in loss of his driver's license. In the context of high school exit exams, poorly performing school districts have formally opposed high-stakes testing after low test results, which accurately and publicly exposed the districts' failures, proved to be politically embarrassing,^[23] and criticized high-stakes tests for correctly identifying students who lack the required knowledge.^[24]
Sometimes high-stakes testing is used on young children. Testing often starts as early as third grade, when children may be unable to properly allocate mental resources needed to succeed. If they fail, they may be assigned additional schooling, which can be internalized as a punishment.^[25]
Low test scores can often be synonymous with good tests.^[26] There can be a bias to assume that for a high stake test to be valid, test results must be poor. Alternatively, tests on which students generally perform well can often be disregarded as being too easy even if they are well aligned to standards. Additionally, this bias can encourage the creation of assessments in which the metric for how good the assessment is becomes the failure rate of students rather than alignment to standards.

Advantages

In addition to the criticisms, high-stakes testing retains some advantages:

Scores and score trends from high-stakes tests tend to be more reliable than those from low- or no-stakes tests because they are more likely to be administered securely and taken seriously by test-takers.^[27]^[28]^[29]^[30]

Lax security pervades the administration of no-stakes tests—tests that "don't count." Indeed, all but one of the tests involved in the famous "Lake Wobegon Effect" school testing scandal of the 1980s had no stakes for students, teachers, or schools. In many cases, schools could administer the tests at their own discretion, with teachers proctoring their own students or no proctors at all. With state and local education administrators free to direct most aspects of the tests' administration, scoring, and reporting, they could artificially inflate scores and score trends such that the students in all US states were "above the national average."^[31]

High-stakes tests are also more likely to be administered externally (by independent persons without a conflict of interest) and securely. Whereas high-stakes testing may create more incentive for cheating, low- or no-stakes testing can create more opportunity for cheating because it is typically administered internally (e.g., in students' schools by their own teachers) with less security. ^[32]^[33]^[34]

Adding stakes to a test has a generally positive impact on student achievement, suggesting greater motivation and effort. ^[35]

Related Research Articles

Maturità or its translated terms is a Latin name for the secondary school exit exam or "maturity diploma" in various European countries, including Albania, Austria, Bosnia and Herzegovina, Bulgaria, Croatia, Czech Republic, Hungary, Italy, Kosovo, Liechtenstein, Montenegro, North Macedonia, Poland, Serbia, Slovakia, Slovenia, Switzerland and Ukraine.

A standardized test is a test that is administered and scored in a consistent, or "standard", manner. Standardized tests are designed in such a way that the questions and interpretations are consistent and are administered and scored in a predetermined, standard manner.

The Victorian Certificate of Education (VCE) is the credential available to secondary school students who successfully complete year 11 and 12 in the Australian state of Victoria as well as in some international schools in China, Malaysia, Philippines, Timor-Leste, and Vietnam.

Educational assessment or educational evaluation is the systematic process of documenting and using empirical data on the knowledge, skill, attitudes, aptitude and beliefs to refine programs and improve student learning. Assessment data can be obtained by examining student work directly to assess the achievement of learning outcomes or it is based on data from which one can make inferences about learning. Assessment is often used interchangeably with test but is not limited to tests. Assessment can focus on the individual learner, the learning community, a course, an academic program, the institution, or the educational system as a whole. The word "assessment" came into use in an educational context after the Second World War.

A cram school is a specialized school that trains its students to achieve particular goals, most commonly to pass the entrance examinations of high schools or universities. The English name is derived from the slang term cramming, meaning to study a large amount of material in a short period of time. The word "crammer" may be used to refer to the school or to an individual teacher who assists a student in cramming.

The College Board, styled as CollegeBoard, is an American not-for-profit organization that was formed in December 1899 as the College Entrance Examination Board (CEEB) to expand access to higher education. While the College Board is not an association of colleges, it runs a membership association of institutions, including over 6,000 schools, colleges, universities, and other educational organizations.

Multiple choice (MC), objective response or MCQ(for multiple choice question) is a form of an objective assessment in which respondents are asked to select only the correct answer from the choices offered as a list. The multiple choice format is most frequently used in educational testing, in market research, and in elections, when a person chooses between multiple candidates, parties, or policies.

Electronic assessment, also known as digital assessment, e-assessment, online assessment or computer-based assessment, is the use of information technology in assessment such as educational assessment, health assessment, psychiatric assessment, and psychological assessment. This covers a wide range of activities ranging from the use of a word processor for assignments to on-screen testing. Specific types of e-assessment include multiple choice, online/electronic submission, computerized adaptive testing such as the Frankfurt Adaptive Concentration Test, and computerized classification testing.

The Florida Comprehensive Assessment Test, or the FCAT/FCAT 2.0, was the standardized test used in the primary and secondary public schools of Florida. First administered statewide in 1998, it replaced the State Student Assessment Test (SSAT) and the High School Competency Test (HSCT). As of the 2014-2015 school year FCAT was replaced in the state of Florida. The Florida Department of Education later implemented the Florida Standards Assessments (FSA) for English Language Arts, Reading, Mathematics and a Writing or typing test. A Comprehensive science test is still used for grades 5 and 8.

The Hong Kong Certificate of Education Examination was a standardised examination between 1974 and 2011 after most local students' five-year secondary education, conducted by the Hong Kong Examinations and Assessment Authority (HKEAA), awarding the Hong Kong Certificate of Education secondary school leaving qualification. The examination has been discontinued in 2012 and its roles are now replaced by the Hong Kong Diploma of Secondary Education as part of educational reforms in Hong Kong. It was considered equivalent to the United Kingdom's GCSE.

In New York State, Regents Examinations are statewide standardized examinations in core high school subjects. Students were required to pass these exams to earn a Regents Diploma. To graduate, students are required to have earned appropriate credits in a number of specific subjects by passing year-long or half-year courses, after which they must pass at least five examinations. For higher-achieving students, a Regents with Advanced designation and an Honors designation are also offered. There are also local diploma options.

The Nationwide Unified Examination for Admissions to General Universities and Colleges (普通高等学校招生全国统一考试), commonly abbreviated as the Gaokao, is the annual national undergraduate admission exam of China, held in early June. The exam is held by provincial governments under directions from the Ministry of Education and is required for undergraduate admissions to all higher education institutions in the country. The Gaokao is taken by high school seniors at the end of their final year.

The California High School Exit Examination (CAHSEE) was an examination created by the California Department of Education, that was previously mandated to administer in high schools statewide in order to graduate. The examination was suspended in 2015, when Governor Jerry Brown signed a bill undoing the decade old requirement. It was originally created by the California Department of Education to improve the academic performance of California high school students, and especially of high school graduates, in the areas of reading, writing, and mathematics. In addition to other graduation requirements, public school students needed to pass the exam before they could receive a high school diploma.

In an educational setting, standards-based assessment is assessment that relies on the evaluation of student understanding with respect to agreed-upon standards, also known as "outcomes". The standards set the criteria for the successful demonstration of the understanding of a concept or skill.

An exit examination is a test that students must pass to receive a diploma and graduate from school. Such examinations have been used in a variety of countries; this article focuses on their use within the United States. These are usually criterion-referenced tests which were implemented as part of a comprehensive standards-based education reform program which sets into place new standards intended to increase the learning of all students.

The A-level is a main school leaving qualification of the General Certificate of Education in England, Wales, Northern Ireland, the Channel Islands and the Isle of Man. It is available as an alternative qualification in other countries, where it is similarly known as an A-Level.

<span class="mw-page-title-main">Exam</span> Educational assessment

An examination or test is an educational assessment intended to measure a test-taker's knowledge, skill, aptitude, physical fitness, or classification in many other topics. A test may be administered verbally, on paper, on a computer, or in a predetermined area that requires a test taker to demonstrate or perform a set of skills.

Placement testing is a practice that many colleges and universities use to assess college readiness and determine which classes a student should initially take. Since most two-year colleges have open, non-competitive admissions policies, many students are admitted without college-level academic qualifications. Placement exams or placement tests assess abilities in English, mathematics and reading; they may also be used in other disciplines such as foreign languages, computer and internet technologies, health and natural sciences. The goal is to offer low-scoring students remedial coursework to prepare them for regular coursework.

References

1 2 "Lexicon of Learning". Association for Supervision and Curriculum Development. Archived from the original on 2018-10-17. Retrieved 2013-02-21.
↑ Rosemary Sutton; Kelvin Seifert (2009). "Chapter 1: The Changing Teaching Profession and You". Educational Psychology (PDF) (2nd ed.). p. 14.
1 2 3 Togut, Torin D. "High-Stakes Testing: Educational Barometer for Success, or False Prognosticator for Failure". The Beacon. No. Fall 2004. Harbor House Law Press.
1 2 Torin D. Togut. "EDEX 790 Glossary of Education Terms". Archived from the original on January 11, 2009. Retrieved July 23, 2009.
1 2 3 4 5 "The nature of assessment: A guide to standardized testing — Center for Public Education". Archived from the original on July 25, 2011. Retrieved July 23, 2009.
↑ Pfeiffer, Steven I (Winter 2009). "The Debate about Using the SAT in College Admissions". Duke University Talent Identification Program. Archived from the original on 2009-10-14. Gaston Caperton, president of the College Board, which publishes the SAT, counters that the SAT I is "not a high-stakes test" but is a useful admissions tool when considered along with other evidence of a student's potential for college success.
1 2 Phelps, Richard P. (June 2010). "Source of Lake Wobegon" (PDF). Nonpartisan Education Review. Retrieved 2020-10-18.
↑ Mari Pearlman (April 4, 2001). "High-stakes Testing: Perils & Opportunities". Archived from the original on 2009-09-25. Retrieved July 23, 2009.
↑ Eddy Ramírez (30 April 2008). "Admissions Officials Shrug at SAT Writing Test" . Retrieved 24 July 2009.
↑ Mehrens, W.A. (1995). Legal and Professional Bases for Licensure Testing.' In Impara, J.C. (Ed.) Licensure testing: Purposes, procedures, and practices, pp. 33-58. Lincoln, NE: Buros Institute.
↑ "NCLB has nothing to do with the high-stakes nature of the test for students". Archived from the original on 2012-12-13.
↑ Greene, Jay P.; Marcus A. Winters; Greg Forster (February 2003). "Testing High Stakes Tests: Can We Believe the Results of Accountability Tests?". Civic Report. Manhattan Institute for Policy Research.
↑ File, Thom; Ryan, Camille (November 2014). "Computer and Internet Use in the United States: 2013" (PDF). census.gov.
↑ Zuriff GE (1997). "Accommodations for test anxiety under ADA?". J. Am. Acad. Psychiatry Law. 25 (2): 197–206. PMID 9213292.
↑ "Appropriate Use of High-Stakes Testing in Our Nation's Schools". American Psychological Association. Retrieved 2008-01-09.
1 2 Jacob, Brian A. and Steven D. Levitt (Winter 2004). "To Catch a Cheat" (PDF). Education Next.
↑ "Figure 1-10: Employee/faculty support for high stakes testing: 2000". Archived from the original on 2008-02-07. Retrieved 2008-02-06.
↑ Lewis, Anne (April 2000). High-stakes testing: Trends and issues (PDF) (Report). Mid-Continent Research for Education and Learning. Archived from the original (PDF) on 2011-07-27.
↑ Myers, David (2001). Psychology . New York: Worth Publishers. p. 464. ISBN 1-57259-791-7. Why blame the tests for exposing unequal experiences and opportunities?
↑ Dang, Nick (18 March 2003). "Reform education, not exit exams". Daily Bruin. One common complaint from failed test-takers is that they weren't taught the tested material in school. Here, inadequate schooling, not the test, is at fault. Blaming the test for one's failure is like blaming the service station for a failed smog check; it ignores the underlying problems within the 'schooling vehicle.'^{[ permanent dead link ‍]}
↑ "Tackling the SAT? Test-prep help abounds". Christian Science Monitor . Vol. 90, no. 175. Associated Press. August 4, 1998. pp. B3. ISSN 0882-7729 . Retrieved 2007-07-09. Some parents spend thousands of dollars for private sessions...
↑ Johnson, Dale, Bonnie Johnson, Stephen J. Farenga, & Daniel Ness. (2008). Stop High-Stakes Testing: An Appeal to America's Conscience. Lanham, MD: Rowman & Littlefield.
↑ Weinkopf, Chris (2002). "Blame the test: LAUSD denies responsibility for low scores". Daily News. Archived from the original on 2017-02-02. Retrieved 2009-09-17. The blame belongs to 'high-stakes tests' like the Stanford 9 and California's High School Exit Exam. Reliance on such tests, the board grumbles, 'unfairly penalizes students that have not been provided with the academic tools to perform to their highest potential on these tests'.
↑ "Blaming The Test". Investor's Business Daily . 11 May 2006. A judge in California is set to strike down that state's high school exit exam. Why? Because it's working. It's telling students they need to learn more. We call that useful information. To the plaintiffs who are suing to stop the use of the test as a graduation requirement, it's something else: Evidence of unequal treatment ... the exit exam was deemed unfair because too many students who failed the test had too few credentialed teachers. Well, maybe they did, but granting them a diploma when they lack the required knowledge only compounds the injustice by leaving them with a worthless piece of paper.^{[ permanent dead link ‍]}
↑ Kozol, Jonathan (2005). The Shame of the Nation. New York: Crown Publishers. p. 53. ISBN 978-1-4000-5245-5.
↑ Kohn, A. (1999) Confusing Harder with Better. Retrieved on 1/26/21 from https://www.alfiekohn.org/article/confusing-harder-better/
↑ Eklöf, Hanna (2007). "Test-taking motivation and mathematics performance in TIMSS". International Journal of Testing. 7 (3): 311–326. doi:10.1080/15305050701438074. S2CID 144686714.
↑ Finn B. (2015). Measuring motivation in low-stakes assessments (Research Report RR-15-19). Educational Testing Service.
↑ Hawthorne, K.A.; Bol, L.; Pribesh, S.; Suh, Y. (2015). "Test-taking motivation and mathematics performance in TIMSS". Research and Practice in Assessment. 10: 30–38.
↑ Wise, SL; DeMars, CE (2010). "Examinee noneffort and the validity of program assessment results". Educational Assessment. 15: 27–41. doi:10.1080/10627191003673216. S2CID 143794026.
↑ "The Lake Wobegon Effect: Twenty Years Later". Nonpartisan Education Review.
↑ Cizek, G.J. (1999). Cheating on Tests: How To Do It, Detect It, and Prevent It. Routledge. doi:10.4324/9781410601520. ISBN 9781410601520.
↑ Steger, D.; Schroeders, U.; Gnambs, T. (2018). "A Meta-Analysis of Test Scores in Proctored and Unproctored Ability Assessments". European Journal of Psychological Assessment. 36: 1–11. doi:10.1027/1015-5759/a000494. S2CID 149485786.
↑ U.S. Government Accountability Office (2013). K-12 Education: States' Test Security Policies and Procedures Varied (Report).
↑ Phelps, R. P. (2019). "Test Frequency, Stakes, and Feedback in Student Achievement: A Meta-Analysis". Evaluation Review. 43 (3–4): 111–151. doi:10.1177/0193841X19865628. PMID 31382776. S2CID 199449477.