A new working paper by researchers Matthew Kraft and Allison Gilmour examines teacher evaluations reform by revisiting The Widget Effect. The widely read TNTP report found that less than 1 percent of teachers in most districts were rated as unsatisfactory—even though 81 percent of principals could identify an ineffective teacher in their school.
Kraft and colleagues looked at the distribution of teacher effectiveness in nineteen states, including fourteen Race To The Top winners. They also conducted a case study in a large urban district in the northeast that adopted new evaluations in 2012–13. The experiment included surveys of evaluators who are responsible for evaluating teachers and interviews of principals. Among the nineteen states, the analysts found that the median percentage of teachers rated below proficient was 2.7 percent. Yet the percentages rated below proficient varied across states, as do those rated above proficient.
They found a wide variation among states from Hawaii (where fewer than 1 percent of teachers were judged below proficient) to New Mexico (where 26 percent of teachers were considered not up to par). Meanwhile, Georgia rated 3 percent of teachers as above proficient, compared to 73 percent in Tennessee. Massachusetts, our highest-performing state, placed 8 percent of teachers above proficient. Analysts also found that having more rating categories does not appear to translate into greater differentiation at the lower end of the rating scale—this is only true at the top end (e.g., the difference between “effective” or “highly effective”).
As for the case study, the survey showed that evaluators estimated that 27.8 percent of all teachers in their schools were performing at a level below proficient. Yet this estimate is four times the percentage of teachers actually rated below proficient. Interviews with principals revealed a number of reasons for this disconnect: time constraints (it takes lots of time to document poor performance); the belief that it’s unfair to rate educators below proficient if they aren’t provided with support; the desire not to discourage teachers who show potential; the fear that a replacement might be worse; the simple difficulty and discomfort of telling a teacher she’s not performing well; and racial worries (concerns that a disproportionate number of non-white teachers would receive low ratings and it would become an identity issue).
These findings tell us that these are not problems that can be addressed with technocratic solutions. On-the-ground "norms and practices are proving much more difficult to change," say the analysts. They are indeed a different beast, not easily handled through straitjacket policies.
SOURCE: Matthew A. Kraft and Allison F. Gilmour, "Revisiting the Widget Effect: Teacher Evaluation Reforms and the Distribution of Teacher Effectiveness," Brown University (February 2016).