Warning signs: Observations on teacher evaluation results in Ohio

Aaron Churchill

7.13.2015

The Ohio Education Research Center (OERC) recently reported the teacher evaluation results from 2013–14, the first year of widespread implementation of the state’s new evaluation policy. The report should serve as an early warning sign while also raising a host of thorny questions about how those evaluations are being conducted in the field.

The study’s main finding is that the overwhelming majority of Ohio teachers received high ratings. In fact, a remarkable 90 percent of teachers were rated “skilled” or “accomplished”—the two highest ratings. By contrast, a mere 1 percent of Buckeye teachers were rated “ineffective”—the lowest of the four possible ratings. These results are implausible; teaching is like other occupations, and worker productivity should vary widely. Yet Ohio’s teacher evaluation system shows little variation between teachers. It’s also evident that the evaluation is quite lenient on teacher performance. But there’s more. Let’s take a look at a few other data points reported by OERC that merit discussion.

1. Most teachers are not part of the value-added system

Given the controversy around value added in teacher evaluation, it may surprise you that most Buckeye teachers don’t receive an evaluation based on value-added results. (Value added refers to a statistical method that isolates the contribution of a teacher to her students’ learning as measured by gains on standardized exams.) Under state law, teachers with instructional responsibilities in grades and subjects in which value added is calculated (presently, grades 4–8 in math and reading) must be evaluated along those results. But as Chart 1 shows, most Ohio educators teach in grades and subjects where no value-added measure exists; thus, they are evaluated along other growth measures such as vendor assessments or student learning objectives (SLOs). These growth measures made up 50 percent of teachers’ overall ratings in 2013–14.[1]

Chart 1: Distribution of Ohio teachers by the type of student growth measure used in the evaluation, 2013–14

[[{"fid":"114524","view_mode":"default","fields":{"format":"default"},"type":"media","attributes":{"class":"media-element file-default"},"link_text":null}]]

Source: Marsha Lewis, Anirudh Ruhil, and Margaret Hutzel, Ohio’s Student Growth Measures (SGMs): A Study of Policy and Practice (Columbus, OH: Ohio Education Research Center, 2015), page 2

With relatively few teachers are in the value-added system, it’s important that we take a closer look at the other growth measures. What do we know about the vendor assessments? For instance, how are schools and teachers selecting them? Are they comparable to using state exams to measure gains? How are the gains calculated, and how are non-classroom effects on gains “controlled” for? What about the SLOs, local assessments developed by teachers? What are their features? How much do they vary from teacher to teacher, or from school to school? How are schools ensuring that the SLOs are robust, especially since there seems to be an inherent conflict of interest when teachers create their own assessment and growth tools?

Meanwhile, the practice of shared attribution—ascribing a school or district’s overall or subgroup value-added result to an individual teacher—deserves serious inquiry too. Interestingly, the OERC report found that 31 percent of evaluated teachers used shared attribution to some extent. (District boards, in consultation with teachers, approve the degree to which shared attribution is used for certain teachers.) How do districts decide the weight placed on shared attribution? Is the district, building, or subgroup value-added result typically used? Why are certain districts permitting its use? Do these districts value teamwork more than others? Or do they have less reason to be concerned about “free riding”—when certain employees shirk, even while receiving credit for the greater organization’s performance?

Maybe there are easy answers to these questions about these non-value-added measures, which apply to 80 percent of Ohio teachers. But this analyst hasn’t seen them. Shouldn’t there be troves of research on how these growth measures are used? Where are the studies demonstrating that these are fair and objective measures of teacher productivity?

2. The evaluation is tougher for teachers in the value-added system

If you’re an educator in the value-added system, you’re less likely than your colleagues to earn a top rating. Consider the results presented in Chart 2: Just 31 percent of teachers fully in the value-added rating system received an overall rating of accomplished, while 50 percent of teachers in the SLO/shared attribution category were rated accomplished. That’s a fairly stark difference, and it indicates that objectively measured performance, as happens in the value-added system, is a tougher standard than the other measures. When the accomplished and skilled ratings are combined, less difference emerges across teachers, as categorized by their growth measure. Yet teachers in the value-added system still appear slightly disadvantaged relative to their colleagues.

Chart 2: Percentage of Ohio teachers receiving the two top overall ratings (accomplished and skilled, the former being the highest rating) by the type of student growth measure used, 2013–14

[[{"fid":"114525","view_mode":"default","fields":{"format":"default"},"type":"media","attributes":{"class":"media-element file-default"},"link_text":null}]]

Source: Lewis, Ruhil, and Hutzel, Ohio’s Student Growth Measures, page 33

The student growth component, not the observational portion of the evaluation, drives the differences between teachers in the value-added system and those who are not. As Chart 3 demonstrates, just 32 percent of teachers fully in the value-added system received the highest possible rating on their evaluation’s growth component (“above”), while 54 percent of teachers in the SLO or shared attribution category received this rating. The evaluation results align with one administrator’s comment to the OERC researchers: “Value-added versus SLOs is not an equal measure.”

Chart 3: Percentage of Ohio teachers receiving each possible rating on the student growth portion of their evaluation by the type of growth-measure used, 2013–14

[[{"fid":"114526","view_mode":"default","fields":{"format":"default"},"type":"media","attributes":{"class":"media-element file-default"},"link_text":null}]]

Source: Lewis, Ruhil, and Hutzel, Ohio’s Student Growth Measures, page 33

3. Classroom observation ratings are lenient

As noted above, student growth measures, including value added if available, comprised 50 percent of teachers’ evaluations in 2013–14. So what about the other half of the evaluation—classroom observations? In Chart 4, we see that 70 percent of teachers were rated skilled and another 24 percent accomplished by their classroom observer. Meanwhile, just a miniscule number of teachers were rated ineffective—less than 1 percent.

Chart 4: Percentage of Ohio teachers (across all student growth categories) in each classroom observation rating category, 2013–14

[[{"fid":"114527","view_mode":"default","fields":{"format":"default"},"type":"media","attributes":{"class":"media-element file-default"},"link_text":null}]]

Source: Lewis, Ruhil, and Hutzel, Ohio’s Student Growth Measures, page 2

The results from the classroom observation side of the evaluation could be considered somewhat predictable. The New Teacher Project documented in The Widget Effect that practically every teacher receives a satisfactory rating when evaluations are observation-based. This reflects a bit of common sense, too: It’s probably difficult for a principal or coworker, who works with a teacher daily, to be a tough or impartial evaluator. (It’s been rightly suggested that external observers may be more appropriate for observation purposes.) The positive results could also reflect something of an acquiescence bias—the tendency toward “yeasaying.” For instance, classroom observers may be inclined to report that a teacher did this or that pretty well, instead of giving an honest performance appraisal. This problem may be aggravated by the fact that principals have almost no authority over hiring, promotion and pay raises, or dismissal. Hence, there’s little incentive for them to conduct tough-minded evaluations, because they can’t connect the evaluation results to staffing decisions. Finally, the results could also reflect the fact that teachers are notified in advance when their formal classroom observations will occur, leading to a positively skewed evaluation relative to their everyday practices.

Conclusion

The first-year results indicate that Ohio’s evaluation system isn’t working properly. (To be sure, this isn’t a problem unique to Ohio; other states appear to be experiencing similar issues.) The evaluation system doesn’t seem to evaluate teachers in an equally rigorous manner across the different grades and subjects; it appears to be excessively lenient, especially when it comes to the observation portion; and it’s not clear how robust the non-value-added measures of growth are. Investigating and correcting these issues, if necessary, will require the full commitment of Ohio policymakers and practitioners. A halfhearted effort is likely doomed to fail.

[1] In 2013–14, Ohio’s evaluation system was based half on classroom observation, which applied to all teachers, and half on student growth measures, which varied depending on the grade and subjects taught. The state established an alternative evaluation framework available to schools in 2014–15—and that alternative framework will change again with the enactment of House Bill 64 in June 2015.