Writing assessment

Last updated

Writing assessment refers to an area of study that contains theories and practices that guide the evaluation of a writer's performance or potential through a writing task. Writing assessment can be considered a combination of scholarship from composition studies and measurement theory within educational assessment. [1] Writing assessment can also refer to the technologies and practices used to evaluate student writing and learning. [2] An important consequence of writing assessment is that the type and manner of assessment may impact writing instruction, with consequences for the character and quality of that instruction. [3]

Contents

Contexts

Writing assessment began as a classroom practice during the first two decades of the 20th century, though high-stakes and standardized tests also emerged during this time. [4] During the 1930s, College Board shifted from using direct writing assessment to indirect assessment because these tests were more cost-effective and were believed to be more reliable. [4] Starting in the 1950s, more students from diverse backgrounds were attending colleges and universities, so administrators made use of standardized testing to decide where these students should be placed, what and how to teach them, and how to measure that they learned what they needed to learn. [5] The large-scale statewide writing assessments that developed during this time combined direct writing assessment with multiple-choice items, a practice that remains dominant today across U.S. large scale testing programs, such as the SAT and GRE. [4] These assessments usually take place outside of the classroom, at the state and national level. However, as more and more students were placed into courses based on their standardized testing scores, writing teachers began to notice a conflict between what students were being tested on—grammar, usage, and vocabulary—and what the teachers were actually teaching—writing process and revision. [5] Because of this divide, educators began pushing for writing assessments that were designed and implemented at the local, programmatic and classroom levels. [5] [6] As writing teachers began designing local assessments, the methods of assessment began to diversify, resulting in timed essay tests, locally designed rubrics, and portfolios. In addition to the classroom and programmatic levels, writing assessment is also hugely influential on writing centers for writing center assessment, and similar academic support centers. [7]

History

Because writing assessment is used in multiple contexts, the history of writing assessment can be traced through examining specific concepts and situations that prompt major shifts in theories and practices. Writing assessment scholars do not always agree about the origin of writing assessment.

The history of writing assessment has been described as consisting of three major shifts in methods used in assessing writing. [5] The first wave of writing assessment (1950-1970) sought objective tests with indirect measures of assessment. The second wave (1970-1986) focused on holistically scored tests where the students' actual writing began to be assessed. And the third wave (since 1986) shifted toward assessing a collection of student work (i.e. portfolio assessment) and programmatic assessment.

The 1961 publication of Factors in Judgments of Writing Ability in 1961 by Diederich, French, and Carlton has also been characterized as marking the birth of modern writing assessment. [8] Diederich et al. based much of their book on research conducted through the Educational Testing Service (ETS) for the previous decade. This book is an attempt to standardize the assessment of writing and is responsible for establishing a base of research in writing assessment. [9]

Major concepts

Validity and reliability

The concepts of validity and reliability have been offered as a kind of heuristic for understanding shifts in priorities in writing assessment [10] as well interpreting what is understood as best practices in writing assessment. [11]

In the first wave of writing assessment, the emphasis is on reliability: [12] reliability confronts questions over the consistency of a test. In this wave, the central concern was to assess writing with the best predictability with the least amount of cost and work.

The shift toward the second wave marked a move toward considering principles of validity. Validity confronts questions over a test's appropriateness and effectiveness for the given purpose. Methods in this wave were more concerned with a test's construct validity: whether the material prompted from a test is an appropriate measure of what the test purports to measure. Teachers began to see an incongruence between the material being prompted to measure writing and the material teachers were asking students to write. Holistic scoring, championed by Edward M. White, emerged in this wave. It is one method of assessment where students' writing is prompted to measure their writing ability. [13]

The third wave of writing assessment emerges with continued interest in the validity of assessment methods. This wave began to consider an expanded definition of validity that includes how portfolio assessment contributes to learning and teaching. In this wave, portfolio assessment emerges to emphasize theories and practices in Composition and Writing Studies such as revision, drafting, and process.

Direct and indirect assessment

Indirect writing assessments typically consist of multiple choice tests on grammar, usage, and vocabulary. [5] Examples include high-stakes standardized tests such as the ACT, SAT, and GRE, which are most often used by colleges and universities for admissions purposes. Other indirect assessments, such as Compass, are used to place students into remedial or mainstream writing courses. Direct writing assessments, like Writeplacer ESL (part of Accuplacer) or a timed essay test, require at least one sample of student writing and are viewed by many writing assessment scholars as more valid than indirect tests because they are assessing actual samples of writing. [5] Portfolio assessment, which generally consists of several pieces of student writing written over the course of a semester, began to replace timed essays during the late 1980s and early 1990s. Portfolio assessment is viewed as being even more valid than timed essay tests because it focuses on multiple samples of student writing that have been composed in the authentic context of the classroom. Portfolios enable assessors to examine multiple samples of student writing and multiple drafts of a single essay. [5]

As technology

Methods

Methods of writing assessment vary depending on the context and type of assessment. The following is an incomplete list of writing assessments frequently administered:

Portfolio

Portfolio assessment is typically used to assess what students have learned at the end of a course or over a period of several years. Course portfolios consist of multiple samples of student writing and a reflective letter or essay in which students describe their writing and work for the course. [5] [14] [15] [16] "Showcase portfolios" contain final drafts of student writing, and "process portfolios" contain multiple drafts of each piece of writing. [17] Both print and electronic portfolios can be either showcase or process portfolios, though electronic portfolios typically contain hyperlinks from the reflective essay or letter to samples of student work and, sometimes, outside sources. [15] [17]

Timed-essay

Timed essay tests were developed as an alternative to multiple choice, indirect writing assessments. Timed essay tests are often used to place students into writing courses appropriate for their skill level. These tests are usually proctored, meaning that testing takes place in a specific location in which students are given a prompt to write in response to within a set time limit. The SAT and GRE both contain timed essay portions.

Rubric

A rubric is a tool used in writing assessment that can be used in several writing contexts. A rubric consists of a set of criteria or descriptions that guides a rater to score or grade a writer. The origins of rubrics can be traced to early attempts in education to standardize and scale writing in the early 20th century. Ernest C Noyes argues in November 1912 for a shift toward assessment practices that were more science-based. One of the original scales used in education was developed by Milo B. Hillegas in A Scale for the Measurement of Quality in English Composition by Young People. This scale is commonly referred to as the Hillegas Scale. The Hillegas Scale and other scales used in education were used by administrators to compare the progress of schools. [18]

In 1961, Diederich, French, and Carlton from the Educational Testing Service (ETS) publish Factors in Judgments for Writing Ability a rubric compiled from a series of raters whose comments were categorized and condensed into a five-factor rubric: [19]

As rubrics began to be used in the classroom, teachers began to advocate for criteria to be negotiated with students to have students stake a claim in the how they would be assessed. Scholars such as Chris Gallagher and Eric Turley, [20] Bob Broad, [21] and Asao Inoue [22] (among many) have advocated that effective use of rubrics comes from local, contextual, and negotiated criteria.

Criticisms:

The introduction of the rubric has stirred debate among scholars. Some educators have argued that rubrics rest on false objective claims and thus rest on subjectivity. [23] Eric Turley and Chris Gallagher argued that state-imposed rubrics are a tool for accountability rather than improvements. Many times rubrics originate outside of the classroom from authors with no relation to the students themselves and they are then interpreted and adapted by other educators. [24] Turley and Gallagher note that "the law of distal diminishment says that any educational tool becomes less instructionally useful -- and more potentially damaging to educational integrity -- the further away from the classroom it originates or travels to." [24] They go on to say it is to be interpreted as a tool for writers to measure a set of consensus values, not to be substituted for an engaged response.

A study by Stellmack et al evaluated the perception and application of rubrics with agreed upon criteria. The results found that when different graders evaluated the same draft, the grader who had already given feedback previously was more likely to note improvement. The researchers concluded that a rubric that had higher reliability would result in greater results to their "review-revise-resubmit procedure". [25]

Anti Rubric: Rubrics both measure the quality of writing, and reflect an individual's beliefs of what a department or particular institution’s rhetorical values. But rubrics lack detail on how an instructor may diverge from their these values. Bob Broad notes that an example of an alternative proposal to the rubric is the [26] “dynamic criteria mapping.”

The single standard of assessment raises further questions, as Elbow touches on the social construction of value in itself. He proposes a communal process stripped of the requirement for agreement, would allow the class “see potentialagreements – unforced agreements in their thinking – while helping them articulate where they disagree.” [27] He proposes that grading could take a multidimensional lens where the potential for ‘good writing’ opens. He points out that in doing so, a singular dimensional rubric attempts to assess a multidimensional performance. [27]

Multiple-choice test

Multiple-choice tests contain questions about usage, grammar, and vocabulary. Standardized tests like the SAT, ACT, and GRE are typically used for college or graduate school admission. Other tests, such as Compass and Accuplacer, are typically used to place students into remedial or mainstream writing courses.

Automated essay scoring

Automated essay scoring (AES) is the use of non-human, computer-assisted assessment practices to rate, score, or grade writing tasks.

Race

Some scholars in writing assessment focus their research on the influence of race on the performance on writing assessments. Scholarship in race and writing assessment seek to study how categories of race and perceptions of race continues to shape writing assessment outcomes. However, some scholars in writing assessment recognize that racism in the 21st century is no longer explicit, [28] but argue for a 'silent' racism in writing assessment practices in which racial inequalities in writing assessment are typically justified with non-racial reasons. [29] These scholars advocate for new developments in writing assessment, in which the intersections of race and writing assessment are brought to the forefront of assessment practices.

See also

Related Research Articles

Alternative assessment is also known under various other terms, including:

<span class="mw-page-title-main">Standardized test</span> Test administered and scored in a predetermined, standard manner

A standardized test is a test that is administered and scored in a consistent, or "standard", manner. Standardized tests are designed in such a way that the questions and interpretations are consistent and are administered and scored in a predetermined, standard manner.

Educational assessment or educational evaluation is the systematic process of documenting and using empirical data on the knowledge, skill, attitudes, aptitude and beliefs to refine programs and improve student learning. Assessment data can be obtained from directly examining student work to assess the achievement of learning outcomes or can be based on data from which one can make inferences about learning. Assessment is often used interchangeably with test, but not limited to tests. Assessment can focus on the individual learner, the learning community, a course, an academic program, the institution, or the educational system as a whole. The word 'assessment' came into use in an educational context after the Second World War.

Electronic assessment, also known as digital assessment, e-assessment, online assessment or computer-based assessment, is the use of information technology in assessment such as educational assessment, health assessment, psychiatric assessment, and psychological assessment. This covers a wide range of activities ranging from the use of a word processor for assignments to on-screen testing. Specific types of e-assessment include multiple choice, online/electronic submission, computerized adaptive testing such as the Frankfurt Adaptive Concentration Test, and computerized classification testing.

The Education Quality and Accountability Office (EQAO) is a Crown agency of the Government of Ontario in Canada. It was legislated into creation in 1996 in response to recommendations made by the Royal Commission on Learning in February 1995.

In US education terminology, a rubric is "a scoring guide used to evaluate the quality of students' constructed responses". Put simply, it is a set of criteria for grading assignments. Rubrics usually contain evaluative criteria, quality definitions for those criteria at particular levels of achievement, and a scoring strategy. They are often presented in table format and can be used by teachers when marking, and by students when planning their work. In UK education, the rubric is the set of instructions at the head of an examination paper.

<span class="mw-page-title-main">STAR (software)</span>

STAR Reading, STAR Early Literacy and STAR Math are standardized, computer-adaptive assessments created by Renaissance Learning, Inc., for use in K–12 education. Each is a "Tier 2" assessment of a skill that can be used any number of times due to item-bank technology. These assessments fall somewhere between progress monitoring tools and high-stakes tests.

English-Language Learner is a term used in some English-speaking countries such as the United States and Canada to describe a person who is learning the English language and has a native language that is not English. Some educational advocates, especially in the United States, classify these students as non-native English speakers or emergent bilinguals. Various other terms are also used to refer to students who are not proficient in English, such as English as a Second Language (ESL), English as an Additional Language (EAL), limited English proficient (LEP), Culturally and Linguistically Diverse (CLD), non-native English speaker, bilingual students, heritage language, emergent bilingual, and language-minority students. The legal term that is used in federal legislation is 'limited English proficient'. The instruction and assessment of students, their cultural background, and the attitudes of classroom teachers towards ELLs have all been found to be factors in the achievement of these students. Several methods have been suggested to effectively teach ELLs, including integrating their home cultures into the classroom, involving them in language-appropriate content-area instruction early on, and integrating literature into their learning programs.

In an educational setting, standards-based assessment is assessment that relies on the evaluation of student understanding with respect to agreed-upon standards, also known as "outcomes". The standards set the criteria for the successful demonstration of the understanding of a concept or skill.

Holistic grading or holistic scoring, in standards-based education, is an approach to scoring essays using a simple grading structure that bases a grade on a paper's overall quality. This type of grading, which is also described as nonreductionist grading, contrasts with analytic grading, which takes more factors into account when assigning a grade. Holistic grading can also be used to assess classroom-based work. Rather than counting errors, a paper is judged holistically and often compared to an anchor paper to evaluate if it meets a writing standard. It differs from other methods of scoring written discourse in two basic ways. It treats the composition as a whole, not assigning separate values to different parts of the writing. And it uses two or more raters, with the final score derived from their independent scores. Holistic scoring has gone by other names: "non-analytic," "overall quality," "general merit," "general impression," "rapid impression." Although the value and validation of the system are a matter of debate, holistic scoring of writing is still in wide application.

<span class="mw-page-title-main">Summative assessment</span> Assessment used to determine student outcomes after an academic course

Summative assessment, summative evaluation, or assessment of learning is the assessment of participants in an educational program. Summative assessments are designed to both assess the effectiveness of the program and the learning of the participants. This contrasts with formative assessment, which summarizes the participants' development at a particular time in order to inform instructors of student learning progress.

The Connecticut Mastery Test, or CMT, is a test administered to students in grades 3 through 8. The CMT tests students in mathematics, reading comprehension, writing, and science. The other major standardized test administered to schoolchildren in Connecticut is the Connecticut Academic Performance Test, or CAPT, which is given in grade 10. Until the 2005–2006 school year, the CMT was administered in the fall; now it is given in the spring.

Authentic assessment is the measurement of "intellectual accomplishments that are worthwhile, significant, and meaningful," as contrasted with multiple-choice tests. Authentic assessment can be devised by the teacher, or in collaboration with the student by engaging student voice. When applying authentic assessment to student learning and achievement, a teacher applies criteria related to “construction of knowledge, disciplined inquiry, and the value of achievement beyond the school.”

The Wechsler Individual Achievement Test Second Edition assesses the academic achievement of children, adolescents, college students and adults, aged 4 through 85. The test enables the assessment of a broad range of academics skills or only a particular area of need. The WIAT-II is a revision of the original WIAT, and additional measures. There are four basic scales: Reading, Math, Writing and Oral Language. Within these scales there is a total of 9 sub-test scores.

Educational measurement refers to the use of educational assessments and the analysis of data such as scores obtained from educational assessments to infer the abilities and proficiencies of students. The approaches overlap with those in psychometrics. Educational measurement is the assigning of numerals to traits such as achievement, interest, attitudes, aptitudes, intelligence and performance.

Teacher quality assessment commonly includes reviews of qualifications, tests of teacher knowledge, observations of practice, and measurements of student learning gains. Assessments of teacher quality are currently used for policymaking, employment and tenure decisions, teacher evaluations, merit pay awards, and as data to inform the professional growth of teachers.

<span class="mw-page-title-main">Kathleen Blake Yancey</span>

Kathleen Blake Yancey is the Kellogg W. Hunt Professor of English at Florida State University in the rhetoric and composition program. Her research interests include composition studies, writing knowledge, creative non-fiction, and writing assessment.

Educator effectiveness is a United States K-12 school system education policy initiative that measures the quality of an educator performance in terms of improving student learning. It describes a variety of methods, such as observations, student assessments, student work samples and examples of teacher work, that education leaders use to determine the effectiveness of a K-12 educator.

Writing center assessment refers to a set of practices used to evaluate writing center spaces. Writing center assessment builds on the larger theories of writing assessment methods and applications by focusing on how those processes can be applied to writing center contexts. In many cases, writing center assessment and any assessment of academic support structures in university settings builds on programmatic assessment principles as well. As a result, writing center assessment can be considered a branch of programmatic assessment, and the methods and approaches used here can be applied to a range of academic support structures, such as digital studio spaces.

The PAA or Prueba de Aptitud Académica is an educational assessment that is used to help universities across Latin America select incoming students. More specifically, it is a standardized test for university admissions. It is offered by College Board Puerto Rico y America Latina (CBPRAL), part of the College Board. The PAA is not a translation of the Scholastic Aptitude Test used in the United States and it is developed independently from the SAT, even though the PAA measures the same constructs as the SAT. While the CBPRAL is based in San Juan, Puerto Rico, the PAA is delivered in a range of Spanish-speaking countries.

References

  1. Behizadeh, Nadia and George Engelhard Jr. "Historical View of the influences of measurement and writing theories on the practice of writing assessment in the United States" Assessing Writing 16 (2011) 189-211.
  2. Huot, B. & Neal, M. (2006). Writing assessment: A techno-history. In C. A. MacArthur, S. Graham, & J. Fitzgerald (Eds.), Handbook of Writing Research (pp. 417-432). New York, NY: Guilford Press.
  3. Hillocks, G.(2002). The Testing Trap: How State Writing Assessments Control Learning. New York: Teachers College Press.
  4. 1 2 3 Behizadeh, Nadia and George Engelhard Jr. "Historical View of the influences of measurement and writing theories on the practice of writing assessment in the United States" Assessing Writing 16 (2011) 189-211
  5. 1 2 3 4 5 6 7 8 Yancey, Kathleen Blake. "Looking Back as We Look Forward: Historicizing Writing Assessment as a Rhetorical Act." College Composition and Communication. 50.3 (1999): 483-503. Web. 23 Feb. 2013.
  6. Huot, Brian. (Re)Articulating Writing Assessment for Teaching and Learning. Logan, Utah: Utah State UP, 2002.
  7. Bell, James H. (2001). "When Hard Questions Are Asked: Evaluating Writing Centers". The Writing Center Journal. 21 (1): 7–28.
  8. Broad, Bob. What we Really Value: Beyond Rubrics in Teaching and Assessing Writing. Logan, UT: Utah State University Press, 2003. Print
  9. Diederich, P.G.; French, J. W.; Carlton, S. T. (1961) Factors in Judgments of Writing Ability. Princeton, NJ: Educational Testing Service
  10. Yancey, Kathleen Blake. "Looking Back as We Look Forward"
  11. O'Neill, Peggy, Cindy Moore, and Brian Huot. A Guide to College Writing Assessment. Logan, UT: Utah State University Press, 2009. Print.
  12. Yancey, Kathleen Blake. "Looking back as We Look Forward"
  13. "Holisticism." College Composition and Communication, 35 (December, 1984): 400-409.
  14. Emmons, Kimberly. "Rethinking Genres of Reflection: Student Portfolio Cover Letters and the Narrative of Progress." Composition Studies 31.1 (2003): 43-62.
  15. 1 2 Neal, Michael. Writing Assessment and the Revolution in Digital Texts and Technologies. NY: Teachers College, 2011.
  16. White, Edward. "The Scoring of Writing Portfolios: Phase 2." College Composition and Communication 56.4 (2005): 581-599.
  17. 1 2 Yancey, Kathleen. "Postmodernism, Palimpsest, and Portfolios: Theoretical Issues in the Representation of Student Work." ePortfolio Performance Support Systems: Constructing, Presenting, and Assessing Portfolios. Eds Katherine V. Wills and Rich Rice. Fort Collins, Colorado: WAC Clearinghouse. Web. 16 November 2013.
  18. Turley, Eric D. and Chris Gallagher. "On the 'Uses' of Rubrics: Reframing the Great Rubric Debate" The English Journal Vol 97. No. 4. (Mar. 2008) pp 87-92.
  19. Diederich, P.G.; French, J. W.; Carlton, S. T. (1961) Factors in Judgments of Writing Ability.
  20. Turley, Eric D. and Chris Gallagher. "On the 'Uses' of Rubrics: Reframing the Great Rubric Debate"
  21. Broad, Bob. What we Really Value: Beyond Rubrics in Teaching and Assessing Writing
  22. Inoue, Asao B. "Community-based Assessment Pedagogy." Assessing Writing. 9 (2005): 208-38. Web. 23 Feb 2013.
  23. "from Stephen Tchudi, President National Council of Teachers of English". NASSP Bulletin. 68 (470): 9–11. November 1984. doi:10.1177/019263658406847003. ISSN   0192-6365.
  24. 1 2 Turley, Eric D.; Gallagher, Chris W. (2008). "On the "Uses" of Rubrics: Reframing the Great Rubric Debate". The English Journal. 97 (4): 87–92. ISSN   0013-8274.
  25. Stellmack, Mark A.; Keenan, Nora K.; Sandidge, Rita R.; Sippl, Amy L.; Konheim-Kalkstein, Yasmine L. (October 2012). "Review, Revise, and Resubmit: The Effects of Self-Critique, Peer Review, and Instructor Feedback on Student Writing". Teaching of Psychology. 39 (4): 235–244. doi:10.1177/0098628312456589. ISSN   0098-6283.
  26. Broad, Bob (2003), "TO TELL THE TRUTH: Beyond Rubrics", What We Really Value, Beyond Rubrics in Teaching and Assessing Writing, University Press of Colorado, pp. 1–15, ISBN   978-0-87421-553-3 , retrieved 2023-06-07
  27. 1 2 Elbow, Peter (2006-01-01). "Do we need a single standard of value for institutional assessment? An essay response to Asao Inoue's "community-based assessment pedagogy"". Assessing Writing. 11 (2): 81–99. doi:10.1016/j.asw.2006.07.003. ISSN   1075-2935.
  28. Bonilla-Silva, Eduardo. Racism Without Racists: Color-Blind Racism and the Persistence of Racial Inequality in the United States. Lanham, MD: Rowman & LittleField Publishers, Inc., 2006. Print.
  29. Behm, Nicholas, and Keith D. Miller. "Challenging the Frameworks of Color-blind Racism: Why We Need a Fourth Wave of Writing Assessment Scholarship." Race and Writing Assessment. Asao B. Inoue, and Mya Poe, eds. NYC: Peter Lang Publishing, 2012. 127-38. Print.