Value-added modeling

Last updated

Value-added modeling (also known as value-added measurement, value-added analysis and value-added assessment) is a method of teacher evaluation that measures the teacher's contribution in a given year by comparing the current test scores of their students to the scores of those same students in previous school years, as well as to the scores of other students in the same grade. In this manner, value-added modeling seeks to isolate the contribution, or value added, that each teacher provides in a given year, which can be compared to the performance measures of other teachers. VAMs are considered to be fairer than simply comparing student achievement scores or gain scores without considering potentially confounding context variables like past performance or income. It is also possible to use this approach to estimate the value added by the school principal or the school as a whole.

Contents

Critics say that the use of tests to evaluate individual teachers has not been scientifically validated, and much of the results are due to chance or conditions beyond the teacher's control, such as outside tutoring. [1] Research shows, however, that differences in teacher effectiveness as measured by value-added of teachers are associated with small economic effects on students. [2]

Method

Researchers use statistical processes on a student's past test scores to predict the student's future test scores, on the assumption that students usually score approximately as well each year as they have in past years. The student's actual score is then compared to the predicted score. The difference between the predicted and actual scores, if any, is assumed to be due to the teacher and the school, rather than to the student's natural ability or socioeconomic circumstances.

In this way, value-added modeling attempts to isolate the teacher's contributions from factors outside the teacher's control that are known to strongly affect student test performance, including the student's general intelligence, poverty, and parental involvement.

By aggregating all of these individual results, statisticians can determine how much a particular teacher improves student achievement, compared to how much the typical teacher would have improved student achievement.

Statisticians use hierarchical linear modeling to predict the score for a given student in a given classroom in a given school. This prediction is based on aggregated results of all students. Each student's predicted score may take into account student level (e.g., past performance, socioeconomic status, race/ethnicity), teacher level (e.g., certification, years of experience, highest degree earned, teaching practices, instructional materials, curriculum) and school level (e.g., size, type, setting) variables into consideration. Which variables are included depends on the model.

Uses

As of 2010, a few school districts across the United States had adopted the system, including the Chicago Public Schools, New York City Department of Education and District of Columbia Public Schools. The rankings have been used to decide on issues of teacher retention and the awarding of bonuses, as well as a tool for identifying those teachers who would benefit most from teacher training. [1] Under Race to the Top and other programs advocating for better methods of evaluating teacher performance, districts have looked to value-added modeling as a supplement to observing teachers in classrooms. [1]

Louisiana legislator Frank A. Hoffmann introduced a bill to authorize the use of value-added modeling techniques in the state's public schools as a means to reward strong teachers and to identify successful pedagogical methods, as well as providing a means to provide additional professional development for those teachers identified as weaker than others. Despite opposition from the Louisiana Federation of Teachers, the bill passed the Louisiana State Senate on May 26, 2010, and was immediately signed into law by Governor Bobby Jindal. [3]

Experts do not recommend using value-added modeling as the sole determinant of any decision. [4] Instead, they recommend using it as a significant factor in a multifaceted evaluation program. [5]

Limitations

As a norm-referenced evaluation system, the teacher's performance is compared to the results seen in other teachers in the chosen comparison group. It is therefore possible to use this model to infer that a teacher is better, worse, or the same as the typical teacher, but it is not possible to use this model to determine whether a given level of performance is desirable.

Because each student's expected score is largely derived from the student's actual scores in previous years, it difficult to use this model to evaluate teachers of kindergarten and first grade. Some research limits the model to teachers of third grade and above.

Schools may not be able to obtain new students' prior scores from the students' former schools, or the scores may not be useful because of the non-comparability of some tests. A school with high levels of student turnover may have difficulty in collecting sufficient data to apply this model. When students change schools in the middle of the year, their progress during the year is not solely attributable to their final teachers.

Value-added scores are more sensitive to teacher effects for mathematics than for language. [4] This may be due to widespread use of poorly constructed tests for reading and language skills, or it may be because teachers ultimately have less influence over language development. [4] Students learn language skills from many sources, especially their families, while they learn math skills primarily in school.

There is some variation in scores from year to year and from class to class. This variation is similar to performance measures in other fields, such as Major League Baseball and thus may reflect real, natural variations in the teacher's performance. [4] Because of this variation, scores are most accurate if they are derived from a large number of students (typically 50 or more). As a result, it is difficult to use this model to evaluate first-year teachers, especially in elementary school, as they may have only taught 20 students. A ranking based on a single classroom is likely to classify the teacher correctly about 65% of the time. This number rises to 88% if ten years' data are available. [6] Additionally, because the confidence interval is wide, the method is most reliable when identifying teachers who are consistently in the top or bottom 10%, rather than trying to draw fine distinctions between teachers that produce more or less typical achievements, such as attempting to determine whether a teacher should be rated as being slightly above or slightly below the median. [6]

Value added scores assume that students are randomly assigned to teachers. In reality students are rarely randomly assigned to teachers or to schools. According to economist and professor, Dr. Jesse M. Rothstein of University of California, Berkeley, "Non-random assignment of students to teachers can bias value added estimates of teachers’ causal effects." [7] The issue of possible bias with the use of value added measures has been the subject of considerable recent study, and other researchers reach the conclusion that value added measures do provide good estimates of teacher effectiveness. See, for example, the recent work of the Measures of Effective Teaching project [8] and the analysis of how value added measures relate to future incomes by Professor Raj Chetty of Harvard and his colleagues. [9]

Research

The idea of judging the effectiveness of teachers based on the learning gains of students was first introduced [10] into the research literature in 1971 by Eric Hanushek, [11] currently a Senior Fellow at the conservative [12] [13] [14] Hoover Institution, an American public policy think tank located at Stanford University in California. It was subsequently analyzed by Richard Murnane of Harvard University among others. [15] The approach has been used in a variety of different analyses to assess the variation in teacher effectiveness within schools, and the estimation has shown large and consistent differences among teachers in the learning pace of their students. [16]

Statistician William Sanders, a senior research manager at SAS introduced the concept to school operations when he developed value-added models for school districts in North Carolina and Tennessee. First created as a teacher evaluation tool for school programs in Tennessee in the 1990s, the use of the technique expanded with the passage of the No Child Left Behind legislation in 2002. Based on his experience and research, Sanders argued that "if you use rigorous, robust methods and surround them with safeguards, you can reliably distinguish highly effective teachers from average teachers and from ineffective teachers." [1]

A 2003 study by the RAND Corporation prepared for the Carnegie Corporation of New York, said that value-added modeling "holds out the promise of separating the effects of teachers and schools from the powerful effects of such noneducational factors as family background" and that studies had shown that there was a wide variance in teacher scores when using such models, which could make value-added modeling an effective tool for evaluating and rewarding teacher performance if the variability could be substantiated as linked to the performance of individual teachers. [17]

The Los Angeles Times reported on the use of the program in that city's schools, creating a searchable web site that provided the score calculated by the value-added modeling system for 6,000 elementary school teachers in the district. United States Secretary of Education Arne Duncan praised the newspaper's reporting on the teacher scores citing it as a model of increased transparency, though he noted that greater openness must be balanced against concerns regarding "privacy, fairness and respect for teachers". [1] In February, 2011, Derek Briggs and Ben Domingue of the National Education Policy Center (NEPC) released a report reanalyzing the same dataset from the L.A. Unified School District, attempting to replicate the results published in the Times, and they found serious limitations of the previous research, concluding that the "research on which the Los Angeles Times relied for its August 2010 teacher effectiveness reporting was demonstrably inadequate to support the published rankings." [18]

The Bill and Melinda Gates Foundation is sponsoring a multi-year study of value-added modeling with their Measures of Effective Teaching program. Initial results, released in December 2010, indicate that both value-added modeling and student perception of several key teacher traits, such as control of the classroom and challenging students with rigorous work, correctly identify effective teachers. [4] The study about student evaluations was done by Ronald Ferguson. The study also discovered that teachers who teach to the test are much less effective, and have significantly lower value-added modeling scores, than teachers who promote a deep conceptual understanding of the full curriculum. [4] Reanalysis of the MET report’s results conducted by Jesse Rothstein, an economist and professor at University of California, Berkeley, dispute some of these interpretations, however. [19] Rothstein argues that the analyses in the report do not support the conclusions, and that "interpreted correctly... [they] undermine rather than validate value-added-based approaches to teacher evaluation.” [20] More recent work from the MET project, however, validates the use of value added approaches. [8]

Principals and leaders

The general idea of value added modeling has also been extended to consider principals and school leaders. While there has been considerable anecdotal discussion about the importance of school leaders, there has been very little systematic research into the impact of them on student outcomes. Recent analysis in Texas has provided evidence about the effectiveness of leaders by looking at how the gains in student achievement for a school change after the principal changes. This outcome-based approach to measuring effectiveness of principals is very similar to the value-added modeling that has been applied to the evaluation of teachers. The early research in Texas finds that principals have a very large impact on student achievement. [21] Conservative estimates indicate that an effective school leader improves the performance of all students in a school, with the magnitude equal on average to two months additional learning gains for the students in each school year. These gains come at least in part through the principal's impact on selecting and retaining good teachers. Ineffective principals, however, have a similarly large negative effect on school performance, suggesting that issues of evaluation are as important with respect to school leadership as they are for teachers.

Criticism and concerns

A report issued by the Economic Policy Institute in August 2010 recognized that "American public schools generally do a poor job of systematically developing and evaluating teachers" but expressed concern that using performance on standardized tests as a measuring tool will not lead to better performance. The EPI report recommends that measures of performance based on standardized test scores be one factor among many that should be considered to "provide a more accurate view of what teachers in fact do in the classroom and how that contributes to student learning." The study called value-added modeling a fairer means of comparing teachers that allows for better measures of educational methodologies and overall school performance, but argued that student test scores were not sufficiently reliable as a means of making "high-stakes personnel decisions". [22]

Edward Haertel, who led the Economic Policy Institute research team, wrote that the methodologies being pushed as part of the Race to the Top program placed "too much emphasis on measures of growth in student achievement that have not yet been adequately studied for the purposes of evaluating teachers and principals" and that the techniques of valued-added modeling need to be more thoroughly evaluated and should only be used "in closely studied pilot projects". [1]

Education policy researcher Gerald Bracey further argued it is possible that a correlation between teachers and short-term changes in test scores may be irrelevant to the actual quality of teaching. Therefore, "it cannot permit causal inferences about individual teachers. At best, it is a beginning step to identify teachers who might need additional professional development." [23]

The American Statistical Association issued an April 8, 2014 statement criticizing the use of value-added models in educational assessment, without ruling out the usefulness of such models. The ASA cited limitations of input data, the influence of factors not included in the models, and large standard errors resulting in unstable year-to-year rankings. [24]

John Ewing, writing in the Notices of the American Mathematical Society criticized the use of value-added models in educational assessment as a form of "mathematical intimidation" and a "rhetorical weapon." Ewing cited problems with input data and the influence of factors not included in the model. [25]

Alternatives

Several alternatives for teacher evaluation have been implemented:

Most experts recommend using multiple measures to evaluate teacher effectiveness. [27]

See also

Related Research Articles

No Child Left Behind Act Former United States law

The No Child Left Behind Act of 2001 (NCLB) was a U.S. Act of Congress that reauthorized the Elementary and Secondary Education Act; it included Title I provisions applying to disadvantaged students. It supported standards-based education reform based on the premise that setting high standards and establishing measurable goals could improve individual outcomes in education. The Act required states to develop assessments in basic skills. To receive federal school funding, states had to give these assessments to all students at select grade levels.

A head teacher, headmaster, headmistress, head, chancellor, principal or school director is the staff member of a school with the greatest responsibility for the management of the school.

The Programme for International Student Assessment (PISA) is a worldwide study by the Organisation for Economic Co-operation and Development (OECD) in member and non-member nations intended to evaluate educational systems by measuring 15-year-old school pupils' scholastic performance on mathematics, science, and reading. It was first performed in 2000 and then repeated every three years. Its aim is to provide comparable data with a view to enabling countries to improve their education policies and outcomes. It measures problem solving and cognition.

William L. Sanders was an American statistician, a senior research fellow with the University of North Carolina at Chapel Hill. He developed the Tennessee Value-Added Assessment System (TVAAS), also known as the Educational Value-Added Assessment System (EVAAS), a method for measuring a teacher's effect on student performance by tracking the progress of students against themselves over the course of their school career with their assignment to various teachers' classes.

Trends in International Mathematics and Science Study

The IEA's Trends in International Mathematics and Science Study (TIMSS) is a series of international assessments of the mathematics and science knowledge of students around the world. The participating students come from a diverse set of educational systems in terms of economic development, geographical location, and population size. In each of the participating educational systems, a minimum of 4,500 to 5,000 students is evaluated. Contextual data about the conditions in which participating students learn mathematics and science are collected from the students and their teachers, their principals, and their parents via questionnaires.

Thomas Joseph Kane is an American education economist who currently holds the position of Walter H. Gale Professor of Education and Economics at the Harvard Graduate School of Education. He has performed research on education policy, labour economics and econometrics. During Bill Clinton's first term as U.S. President, Kane served on the Council of Economic Advisers.

Eric Hanushek

Eric Alan Hanushek is an economist who has written prolifically on public policy with a special emphasis on the economics of education. Since 2000 he has been a Paul and Jean Hanna Senior Fellow at the Hoover Institution, an American public policy think tank located at Stanford University in California.

Class-size reduction

As an educational reform goal, class size reduction (CSR) aims to increase the number of individualized student-teacher interactions intended to improve student learning. A reform long holding theoretical attraction to many constituencies, some have claimed CSR as the most studied educational reform of the last century. Until recently, interpretations of these studies have often been contentious. Some educational groups like the American Federation of Teachers and National Education Association are in favor of reducing class sizes. Others argue that class size reduction has little effect on student achievement. Many are concerned about the costs of reducing class sizes.

TNTP, formerly known as The New Teacher Project, is an organization in the United States with a mission of ensuring that poor and minority students get equal access to effective teachers. It helps urban school districts and states recruit and train new teachers, staff challenged schools, design evaluation systems, and retain teachers who have demonstrated the ability to raise student achievement. TNTP is a non-profit organization and was founded by Michelle Rhee in 1997.

Last in First Out is a policy often used by school districts and other employers to prioritize layoffs by seniority. Under LIFO layoff rules, junior teachers and other employees lose their jobs before senior ones. Laying off junior employees first is not exclusive to the education sector or to the United States, but is perhaps most controversial there. LIFO's proponents claim that it protects teachers with tenure and gives them job stability, and that it is an easily administered way of accomplishing layoffs following a budget cut. LIFO's critics respond that it is bad for students. They prefer that the best teachers remain regardless of how long they have been teaching.

Teacher quality assessment commonly includes reviews of qualifications, tests of teacher knowledge, observations of practice, and measurements of student learning gains. Assessments of teacher quality are currently used for policymaking, employment and tenure decisions, teacher evaluations, merit pay awards, and as data to inform the professional growth of teachers.

Educator effectiveness is a K-12 school system education policy initiative that measures the quality of an educator performance in terms of improving student learning. It describes a variety of methods, such as observations, student assessments, student work samples and examples of teacher work, that education leaders use to determine the effectiveness of a K-12 educator.

Teacher retention is a field of education research that focuses on how factors such as school characteristics and teacher demographics affect whether teachers stay in their schools, move to different schools, or leave the profession before retirement. The field developed in response to a perceived shortage in the education labor market in the 1990s. Teacher attrition is thought to be higher in low income schools and in high need subjects like math, science, and special education. More recent evidence suggests that school culture and leadership has the most significant effect on teacher decisions to stay or leave.

The Programme for International Student Assessment has had several runs before the most recent one in 2012. The first PISA assessment was carried out in 2000. The results of each period of assessment take about one year and a half to be analysed. First results were published in November 2001. The release of raw data and the publication of technical report and data handbook only took place in spring 2002. The triennial repeats follow a similar schedule; the process of seeing through a single PISA cycle, start-to-finish, always takes over four years. 470,000 15-year-old students representing 65 nations and territories participated in PISA 2009. An additional 50,000 students representing nine nations were tested in 2010.

Data-driven instruction is an educational approach that relies on information to inform teaching and learning. The idea refers to a method teachers use to improve instruction by looking at the information they have about their students. It takes place within the classroom, compared to data-driven decision making. Data-driven instruction works on two levels. One, it provides teachers the ability to be more responsive to students’ needs, and two, it allows students to be in charge of their own learning. Data-driven instruction can be understood through examination of its history, how it is used in the classroom, its attributes, and examples from teachers using this process.

Susanna Loeb is an American education economist and the Barnett Family Professor of Education at the Stanford Graduate School of Education, where she also served as founding director of the Center for Education Policy Analysis (CEPA). Moreover, she directs Policy Analysis for California Education (PACE). Her research interests include the economics of education and the relationship between schools and educational policies, in particular school finance and teacher labor markets.

Brian Aaron Jacob is an American economist and a professor of public policy, economics and education at the Gerald R. Ford School of Public Policy of the University of Michigan. There, he also currently serves as co-director of the Education Policy Initiative and of the Youth Policy Lab. In 2008, Jacob's research on education policy was awarded the David N. Kershaw Award, which is given by the Association for Public Policy Analysis and Management and honours persons who have made a distinguished contribution to the field of public policy analysis and management before the age of 40. His doctoral advisor at the University of Chicago was Freakonomics author Steven Levitt.

Jonah E. Rockoff is an American education economist and currently works as Professor of Finance and Economics at the Columbia Graduate School of Business. Rockoff's research interests include the economics of education and public finance. His research on the management of public schools has been awarded the 2016 George S. Eccles Research Award in Finance and Economics by Columbia Business School.

James H. Wyckoff is a U.S.-American education economist who currently serves as Curry Memorial Professor of Education and Public Policy at the University of Virginia, where he is also the Director of the Center for Education Policy and Workforce Competitiveness. His research on the impact of teacher compensation on teacher performance has been awarded the Raymond Vernon Memorial Award of the Association for Public Policy Analysis and Management in 2015.

John F. Kain was an American empirical economist and college professor. He is notable for first hypothesising spatial mismatch theory, whereby he argued that there are insufficient job opportunities in low-income household areas. Kain is also notable for his focus on subordination and transport economics. As well as his research, he is also known for his long career of teaching at Harvard University and the University of Texas at Dallas, as well as founding the Texas Schools Project.

References

  1. 1 2 3 4 5 6 Dillon, Sam. "Method to Grade Teachers Provokes Battles", The New York Times , August 31, 2010. Accessed September 1, 2010.
  2. It is also complicated to calculate an accurate estimate if the students have many teachers in the course of a year. Eric Hanushek, "Valuing teachers: How much is a good teacher worth?" Education Next 11, no. 3 (Summer 2011).; Raj Chetty, John N. Friedman, and Jonah E. Rockoff, "Measuring the impacts of teachers II: Teacher value-added and student outcomes in adulthood," American Economic Review, Volume 104, Number 9, September 2014, pp. 2633-2679.
  3. Staff. "Value-added evaluation bill is now law", "Louisiana Federation of Teachers Weekly Legislative Digest, May 28, 2010. Accessed September 1, 2010.
  4. 1 2 3 4 5 6 7 "Learning about Teaching: Initial Findings from the Measuring Effective Teaching Program". Bill and Melinda Gates Foundation. December 2010. Lay summary The Los Angeles Times (11 December 2010).Cite journal requires |journal= (help)
  5. Scherrer, Jimmy (2011). "Measuring Teaching Using Value-Added Modeling: The Imperfect Panacea". NASSP Bulletin. 95 (2): 122–140. doi:10.1177/0192636511410052. S2CID   145460616.
  6. 1 2 Otterman, Sharon (26 December 2010). "Hurdles Emerge in Rising Effort to Rate Teachers". The New York Times.
  7. Rothstein, Jesse M. (February 2010). "Teacher quality in educational production: Tracking, decay, and student achievement" (PDF). Quarterly Journal of Economics. Archived from the original (PDF) on 2013-07-20. Retrieved 2013-12-07.
  8. 1 2 Thomas J.Kane, Daniel F. McCaffrey, Trey Miller, and Douglas O. Staiger, Have We Identified Effective Teachers? Validating Measures of Effective Teaching Using Random Assignment. MET Project: Bill and Melinda Gates Foundation (January 2013)
  9. Chetty, Raj; Friedman, John N.; Rockoff, Jonah. "Measuring the impacts of teachers I: Evaluating bias in teacher value-added estimates". The American Economic Review . American Economic Association.
  10. Green, Elizabeth (2014). Building a Better Teacher: How Teaching Works (and How to Teach It to Everyone) . W. W. Norton & Company. pp.  40–44. ISBN   978-0-393-08159-6. Building a better teacher.
  11. Eric A. Hanushek, "Teacher Characteristics and Gains in Student Achievement: Estimation Using Micro-Data," American Economic Review, 61(2), May 1971, pp. 280-288; Eric A. Hanushek, "The Trade-off Between Child Quantity and Quality", Journal of Political Economy, 100(1), February 1992, pp. 84-117.
  12. Lindsay, Leon (June 2, 1983). "Stanford's conservative think tank is under fire". The Christian Science Monitor. Retrieved 30 August 2013.
  13. de Lama, George (March 3, 1993). "With GOP out of power, conservative think tank is a quieter place". Chicago Tribune. Retrieved 30 August 2013.
  14. "Think tank's Hoover Tower turns 50". Daily News of Los Angeles. July 19, 1991. Retrieved 30 August 2013.
  15. Richard J. Murnane, Impact of School Resources on the Learning of Inner City Children, Ballinger.
  16. Eric A. Hanushek and Steven G. Rivkin, "Generalizations about the Use of Value-Added Measures of Teacher Quality", American Economic Review, 100(2), May 2010, pp. 267-271. doi : 10.1257/aer.100.2.267
  17. McCaffrey, Daniel F.; Lockwood, J. R.; Koretz, Daniel M.; and Hamilton, Laura S. "Evaluating Value-Added Models for Teacher Accountability", RAND Corporation, 2003. Accessed September 1, 2010.
  18. Briggs, D. and Domingue, B. "Due Diligence and the Evaluation of Teachers", National Education Policy Center, 2011. Accessed April 3, 2011.
  19. Rothstein, R. "Review of Learning About Teaching", National Education Policy Center, December, 2010.
  20. "Gates Report Touting 'Value-Added' Reached Wrong Conclusion | National Education Policy Center". Nepc.colorado.edu. 2011-01-13. Retrieved 2014-01-13.
  21. Gregory Branch, Eric Hanushek, and Steven G. Rivkin, "School Leaders Matter: Measuring the impact of effective principals", Education Next 13(1), Winter 2013.
  22. Baker, Eva L.; Barton, Paul E.; Darling-Hammond, Linda; Haertel, Edward; Ladd, Hellen F.; Linn, Robert L.; Ravitch, Diane; Rothstein, Richard; Shavelson, Richard J.; and Shepard, Lorrie A. "Problems with the Use of Student Test Scores to Evaluate Teachers", Economic Policy Institute, August 29, 2010. Accessed September 1, 2010.
  23. Bracey, Gerald, "Value Subtracted: A "Debate" with William Sanders", HuffPost, May 1, 2007. Accessed September 17, 2012.
  24. American Statistical Association, "ASA Statement on Using Value-Added Models for Educational Assessment", April 8, 2014. Accessed August 4, 2017.
  25. Ewing, John, "Mathematical Intimidation: Driven by the Data", Notices of the American Mathematical Society, 58:5, May 2011. Accessed August 4, 2017
  26. Harris, Douglas N.; Sass, Tim R. (2011). "Teacher training, teacher quality and student achievement". Journal of Public Economics. 95 (7–8): 798–812. CiteSeerX   10.1.1.567.1794 . doi:10.1016/j.jpubeco.2010.11.009.
  27. Patricia H. Hinchey (December 2010). "Getting Teacher Assessment Right: What Policymakers Can Learn From Research". NEPC Policy Brief.