Writing assessment refers to an area of study that contains theories and practices that guide the evaluation of a writer's performance or potential through a writing task. Writing assessment can be considered a combination of scholarship from composition studies and measurement theory within educational assessment. [1] Writing assessment can also refer to the technologies and practices used to evaluate student writing and learning. [2] An important consequence of writing assessment is that the type and manner of assessment may impact writing instruction, with consequences for the character and quality of that instruction. [3]
Writing assessment began as a classroom practice during the first two decades of the 20th century, though high-stakes and standardized tests also emerged during this time. [4] During the 1930s, College Board shifted from using direct writing assessment to indirect assessment because these tests were more cost-effective and were believed to be more reliable. [4] Starting in the 1950s, more students from diverse backgrounds were attending colleges and universities, so administrators made use of standardized testing to decide where these students should be placed, what and how to teach them, and how to measure that they learned what they needed to learn. [5] The large-scale statewide writing assessments that developed during this time combined direct writing assessment with multiple-choice items, a practice that remains dominant today across U.S. large scale testing programs, such as the SAT and GRE. [4] These assessments usually take place outside of the classroom, at the state and national level. However, as more and more students were placed into courses based on their standardized testing scores, writing teachers began to notice a conflict between what students were being tested on—grammar, usage, and vocabulary—and what the teachers were actually teaching—writing process and revision. [5] Because of this divide, educators began pushing for writing assessments that were designed and implemented at the local, programmatic and classroom levels. [5] [6] As writing teachers began designing local assessments, the methods of assessment began to diversify, resulting in timed essay tests, locally designed rubrics, and portfolios. In addition to the classroom and programmatic levels, writing assessment is also hugely influential on writing centers for writing center assessment, and similar academic support centers. [7]
Because writing assessment is used in multiple contexts, the history of writing assessment can be traced through examining specific concepts and situations that prompt major shifts in theories and practices. Writing assessment scholars do not always agree about the origin of writing assessment.
The history of writing assessment has been described as consisting of three major shifts in methods used in assessing writing. [5] The first wave of writing assessment (1950-1970) sought objective tests with indirect measures of assessment. The second wave (1970-1986) focused on holistically scored tests where the students' actual writing began to be assessed. And the third wave (since 1986) shifted toward assessing a collection of student work (i.e. portfolio assessment) and programmatic assessment.
The 1961 publication of Factors in Judgments of Writing Ability in 1961 by Diederich, French, and Carlton has also been characterized as marking the birth of modern writing assessment. [8] Diederich et al. based much of their book on research conducted through the Educational Testing Service (ETS) for the previous decade. This book is an attempt to standardize the assessment of writing and is responsible for establishing a base of research in writing assessment. [9]
The concepts of validity and reliability have been offered as a kind of heuristic for understanding shifts in priorities in writing assessment [10] as well interpreting what is understood as best practices in writing assessment. [11]
In the first wave of writing assessment, the emphasis is on reliability: [12] reliability confronts questions over the consistency of a test. In this wave, the central concern was to assess writing with the best predictability with the least amount of cost and work.
The shift toward the second wave marked a move toward considering principles of validity. Validity confronts questions over a test's appropriateness and effectiveness for the given purpose. Methods in this wave were more concerned with a test's construct validity: whether the material prompted from a test is an appropriate measure of what the test purports to measure. Teachers began to see an incongruence between the material being prompted to measure writing and the material teachers were asking students to write. Holistic scoring, championed by writing scholar Edward M. White, emerged in this wave. It is one method of assessment where students' writing is prompted to measure their writing ability. [13]
The third wave of writing assessment emerges with continued interest in the validity of assessment methods. This wave began to consider an expanded definition of validity that includes how portfolio assessment contributes to learning and teaching. In this wave, portfolio assessment emerges to emphasize theories and practices in Composition and Writing Studies such as revision, drafting, and process.
Indirect writing assessments typically consist of multiple choice tests on grammar, usage, and vocabulary. [5] Examples include high-stakes standardized tests such as the ACT, SAT, and GRE, which are most often used by colleges and universities for admissions purposes. Other indirect assessments, such as Compass, are used to place students into remedial or mainstream writing courses. Direct writing assessments, like Writeplacer ESL (part of Accuplacer) or a timed essay test, require at least one sample of student writing and are viewed by many writing assessment scholars as more valid than indirect tests because they are assessing actual samples of writing. [5] Portfolio assessment, which generally consists of several pieces of student writing written over the course of a semester, began to replace timed essays during the late 1980s and early 1990s. Portfolio assessment is viewed as being even more valid than timed essay tests because it focuses on multiple samples of student writing that have been composed in the authentic context of the classroom. Portfolios enable assessors to examine multiple samples of student writing and multiple drafts of a single essay. [5]
This section needs expansion. You can help by adding to it. (October 2016) |
Methods of writing assessment vary depending on the context and type of assessment. The following is an incomplete list of writing assessments frequently administered:
Portfolio assessment is typically used to assess what students have learned at the end of a course or over a period of several years. Course portfolios consist of multiple samples of student writing and a reflective letter or essay in which students describe their writing and work for the course. [5] [14] [15] [16] "Showcase portfolios" contain final drafts of student writing, and "process portfolios" contain multiple drafts of each piece of writing. [17] Both print and electronic portfolios can be either showcase or process portfolios, though electronic portfolios typically contain hyperlinks from the reflective essay or letter to samples of student work and, sometimes, outside sources. [15] [17]
Timed essay tests were developed as an alternative to multiple choice, indirect writing assessments. Timed essay tests are often used to place students into writing courses appropriate for their skill level. These tests are usually proctored, meaning that testing takes place in a specific location in which students are given a prompt to write in response to within a set time limit. The SAT and GRE both contain timed essay portions.
A rubric is a tool used in writing assessment that can be used in several writing contexts. A rubric consists of a set of criteria or descriptions that guides a rater to score or grade a writer. The origins of rubrics can be traced to early attempts in education to standardize and scale writing in the early 20th century. Ernest C Noyes argues in November 1912 for a shift toward assessment practices that were more science-based. One of the original scales used in education was developed by Milo B. Hillegas in A Scale for the Measurement of Quality in English Composition by Young People. This scale is commonly referred to as the Hillegas Scale. The Hillegas Scale and other scales used in education were used by administrators to compare the progress of schools. [18]
In 1961, Diederich, French, and Carlton from the Educational Testing Service (ETS) publish Factors in Judgments for Writing Ability a rubric compiled from a series of raters whose comments were categorized and condensed into a five-factor rubric: [19]
As rubrics began to be used in the classroom, teachers began to advocate for criteria to be negotiated with students to have students stake a claim in the how they would be assessed. Scholars such as Chris Gallagher and Eric Turley, [20] Bob Broad, [21] and Asao Inoue [22] (among many) have advocated that effective use of rubrics comes from local, contextual, and negotiated criteria.
Criticisms:
The introduction of the rubric has stirred debate among scholars. Some educators have argued that rubrics rest on false objective claims and thus rest on subjectivity. [23] Eric Turley and Chris Gallagher argued that state-imposed rubrics are a tool for accountability rather than improvements. Many times rubrics originate outside of the classroom from authors with no relation to the students themselves and they are then interpreted and adapted by other educators. [24] Turley and Gallagher note that "the law of distal diminishment says that any educational tool becomes less instructionally useful -- and more potentially damaging to educational integrity -- the further away from the classroom it originates or travels to." [24] They go on to say it is to be interpreted as a tool for writers to measure a set of consensus values, not to be substituted for an engaged response.
A study by Stellmack et al evaluated the perception and application of rubrics with agreed upon criteria. The results found that when different graders evaluated the same draft, the grader who had already given feedback previously was more likely to note improvement. The researchers concluded that a rubric that had higher reliability would result in greater results to their "review-revise-resubmit procedure". [25]
Anti Rubric: Rubrics both measure the quality of writing, and reflect an individual's beliefs of what a department or particular institution’s rhetorical values. But rubrics lack detail on how an instructor may diverge from their these values. Bob Broad notes that an example of an alternative proposal to the rubric is the [26] “dynamic criteria mapping.”
The single standard of assessment raises further questions, as Elbow touches on the social construction of value in itself. He proposes a communal process stripped of the requirement for agreement, would allow the class “see potentialagreements – unforced agreements in their thinking – while helping them articulate where they disagree.” [27] He proposes that grading could take a multidimensional lens where the potential for ‘good writing’ opens. He points out that in doing so, a singular dimensional rubric attempts to assess a multidimensional performance. [27]
Multiple-choice tests contain questions about usage, grammar, and vocabulary. Standardized tests like the SAT, ACT, and GRE are typically used for college or graduate school admission. Other tests, such as Compass and Accuplacer, are typically used to place students into remedial or mainstream writing courses.
Automated essay scoring (AES) is the use of non-human, computer-assisted assessment practices to rate, score, or grade writing tasks.
Some scholars in writing assessment focus their research on the influence of race on the performance on writing assessments. Scholarship in race and writing assessment seek to study how categories of race and perceptions of race continues to shape writing assessment outcomes. However, some scholars in writing assessment recognize that racism in the 21st century is no longer explicit, [28] but argue for a 'silent' racism in writing assessment practices in which racial inequalities in writing assessment are typically justified with non-racial reasons. [29] These scholars advocate for new developments in writing assessment, in which the intersections of race and writing assessment are brought to the forefront of assessment practices.
Alternative assessment is also known under various other terms, including:
A standardized test is a test that is administered and scored in a consistent, or "standard", manner. Standardized tests are designed in such a way that the questions and interpretations are consistent and are administered and scored in a predetermined, standard manner.
Educational assessment or educational evaluation is the systematic process of documenting and using empirical data on the knowledge, skill, attitudes, aptitude and beliefs to refine programs and improve student learning. Assessment data can be obtained by examining student work directly to assess the achievement of learning outcomes or it is based on data from which one can make inferences about learning. Assessment is often used interchangeably with test but is not limited to tests. Assessment can focus on the individual learner, the learning community, a course, an academic program, the institution, or the educational system as a whole. The word "assessment" came into use in an educational context after the Second World War.
Electronic assessment, also known as digital assessment, e-assessment, online assessment or computer-based assessment, is the use of information technology in assessment such as educational assessment, health assessment, psychiatric assessment, and psychological assessment. This covers a wide range of activities ranging from the use of a word processor for assignments to on-screen testing. Specific types of e-assessment include multiple choice, online/electronic submission, computerized adaptive testing such as the Frankfurt Adaptive Concentration Test, and computerized classification testing.
A concept inventory is a criterion-referenced test designed to help determine whether a student has an accurate working knowledge of a specific set of concepts. Historically, concept inventories have been in the form of multiple-choice tests in order to aid interpretability and facilitate administration in large classes. Unlike a typical, teacher-authored multiple-choice test, questions and response choices on concept inventories are the subject of extensive research. The aims of the research include ascertaining (a) the range of what individuals think a particular question is asking and (b) the most common responses to the questions. Concept inventories are evaluated to ensure test reliability and validity. In its final form, each question includes one correct answer and several distractors.
The Education Quality and Accountability Office (EQAO) is a Crown agency of the Government of Ontario in Canada. It was legislated into creation in 1996 in response to recommendations made by the Royal Commission on Learning in February 1995.
In the realm of US education, a rubric is a "scoring guide used to evaluate the quality of students' constructed responses" according to James Popham. In simpler terms, it serves as a set of criteria for grading assignments. Typically presented in table format, rubrics contain evaluative criteria, quality definitions for various levels of achievement, and a scoring strategy. They play a dual role for teachers in marking assignments and for students in planning their work.
STAR Reading, STAR Early Literacy and STAR Math are standardized, computer-adaptive assessments created by Renaissance Learning, Inc., for use in K–12 education. Each is a "Tier 2" assessment of a skill (reading practice, math practice, and early literacy, respectively that can be used any number of times due to item-bank technology. These assessments fall somewhere between progress monitoring tools and high-stakes tests.
English-language learner is a term used in some English-speaking countries such as the United States and Canada to describe a person who is learning the English language and has a native language that is not English. Some educational advocates, especially in the United States, classify these students as non-native English speakers or emergent bilinguals. Various other terms are also used to refer to students who are not proficient in English, such as English as a second language (ESL), English as an additional language (EAL), limited English proficient (LEP), culturally and linguistically diverse (CLD), non-native English speaker, bilingual students, heritage language, emergent bilingual, and language-minority students. The legal term that is used in federal legislation is 'limited English proficient'.
In an educational setting, standards-based assessment is assessment that relies on the evaluation of student understanding with respect to agreed-upon standards, also known as "outcomes". The standards set the criteria for the successful demonstration of the understanding of a concept or skill.
Holistic grading or holistic scoring, in standards-based education, is an approach to scoring essays using a simple grading structure that bases a grade on a paper's overall quality. This type of grading, which is also described as nonreductionist grading, contrasts with analytic grading, which takes more factors into account when assigning a grade. Holistic grading can also be used to assess classroom-based work. Rather than counting errors, a paper is judged holistically and often compared to an anchor paper to evaluate if it meets a writing standard. It differs from other methods of scoring written discourse in two basic ways. It treats the composition as a whole, not assigning separate values to different parts of the writing. And it uses two or more raters, with the final score derived from their independent scores. Holistic scoring has gone by other names: "non-analytic," "overall quality," "general merit," "general impression," "rapid impression." Although the value and validation of the system are a matter of debate, holistic scoring of writing is still in wide application.
The Connecticut Mastery Test, or CMT, is a test administered to students in grades 3 through 8. The CMT tests students in mathematics, reading comprehension, writing, and science. The other major standardized test administered to schoolchildren in Connecticut is the Connecticut Academic Performance Test, or CAPT, which is given in grade 10. Until the 2005–2006 school year, the CMT was administered in the fall; now it is given in the spring.
Authentic assessment is the measurement of "intellectual accomplishments that are worthwhile, significant, and meaningful" Authentic assessment can be devised by the teacher, or in collaboration with the student by engaging student voice. When applying authentic assessment to student learning and achievement, a teacher applies criteria related to “construction of knowledge, disciplined inquiry, and the value of achievement beyond the school.”
The Wechsler Individual Achievement Test Second Edition assesses the academic achievement of children, adolescents, college students and adults, aged 4 through 85. The test enables the assessment of a broad range of academics skills or only a particular area of need. The WIAT-II is a revision of the original WIAT, and additional measures. There are four basic scales: Reading, Math, Writing and Oral Language. Within these scales there is a total of 9 sub-test scores.
Teacher quality assessment commonly includes reviews of qualifications, tests of teacher knowledge, observations of practice, and measurements of student learning gains. Assessments of teacher quality are currently used for policymaking, employment and tenure decisions, teacher evaluations, merit pay awards, and as data to inform the professional growth of teachers.
An examination or test is an educational assessment intended to measure a test-taker's knowledge, skill, aptitude, physical fitness, or classification in many other topics. A test may be administered verbally, on paper, on a computer, or in a predetermined area that requires a test taker to demonstrate or perform a set of skills.
Kathleen Blake Yancey is the Kellogg W. Hunt Professor of English at Florida State University in the rhetoric and composition program. Her research interests include composition studies, writing knowledge, creative non-fiction, and writing assessment.
Educator effectiveness is a United States K-12 school system education policy initiative that measures the quality of an educator performance in terms of improving student learning. It describes a variety of methods, such as observations, student assessments, student work samples and examples of teacher work, that education leaders use to determine the effectiveness of a K-12 educator.
Writing center assessment refers to a set of practices used to evaluate writing center spaces. Writing center assessment builds on the larger theories of writing assessment methods and applications by focusing on how those processes can be applied to writing center contexts. In many cases, writing center assessment and any assessment of academic support structures in university settings builds on programmatic assessment principles as well. As a result, writing center assessment can be considered a branch of programmatic assessment, and the methods and approaches used here can be applied to a range of academic support structures, such as digital studio spaces.
Asao B. Inoue is a Japanese American academic writer and professor of rhetoric and composition in the College of Integrative Sciences and Arts at Arizona State University whose research and teaching focus on anti-racist writing assessment. In 2019, Inoue was elected the Conference on College Composition and Communication (CCCC) Chair. He delivered the keynote presentation for the 2019 CCCC Annual Convention, entitled "How Do We Language So People Stop Killing Each Other, Or What Do We Do About White Language Supremacy?" Inoue is the recipient of multiple disciplinary and institutional academic awards, including the 2017 CCCC Outstanding Book Award, the 2017 Council of Writing Program Administrators (CWPA) Best Book Award, and the 2012 Provost's Award for Teaching Excellence at California State University, Fresno.