The Every Student Succeeds Act (ESSA) has put the future of teacher evaluations firmly in the hands of states. Ohio is now in full control of deciding how to develop and best implement its nascent system.
It should come as no surprise to folks in the Buckeye State that the Ohio Teacher Evaluation System (OTES) has significant room for improvement. Since its inception in 2009, approximately 90 percent of Ohio teachers have been rated in the top two categories and labeled “skilled” or “accomplished.” Unfortunately, there isn’t significant evidence that the system has impacted the quality of Ohio’s teacher workforce, perhaps because there is no statewide law that permits administrators to dismiss teachers based solely on evaluation ratings. Meanwhile, OTES also doesn’t appear to be delivering on the promise to aid teachers in improving their practice.
A quick glance at the ODE-provided template for the professional growth plan, which is used by all teachers except those who are rated ineffective or have below-average student growth, offers a clue as to why practice may not be improving. It is a one-page, fill-in-the-blank sheet. The performance evaluation rubric by which teachers’ observation ratings are determined doesn’t clearly differentiate between performance levels, offer examples of what each level looks like in practice, or outline possible sources of evidence for each indicator. In fact, in terms of providing teachers with actionable feedback, Ohio’s rubric looks downright insufficient compared to other frameworks like Charlotte Danielson’s Framework for Teaching.
Another problem with OTES is the way it incorporates student learning. Ohio law requires that all teacher evaluations include a student growth component, which consists of test results. For teachers with a valid grade- and subject-specific assessment, that means value-added measures. Unfortunately, only 20 percent of Ohio teachers are able to be measured using the state assessment either in whole or in part.[1] Another 14 percent receive growth scores from a separately administered vendor assessment that increases the testing burden placed upon students and schools. The student growth component for the remaining 66 percent is based on locally developed measures that tend to be both ineffective and unfair: shared attribution, which evaluates teachers based on test scores from subjects they don’t teach, and Student Learning Objectives (SLOs), which are extremely difficult to implement consistently and rigorously and often fail to effectively differentiate teacher performance. In short, the state hasn’t quite figured out how to fairly evaluate all teachers using student achievement data.
A meaningful overhaul of Ohio’s system should aim to solve four significant problems. First, it should address the current framework’s failure to fairly evaluate all teachers. Second, it should do a far better job of differentiating teacher performance. Third, it should provide actionable feedback to all teachers. And finally, it must positively impact the overall quality of the workforce. Crafting a system that does all this is easier said than done. Fortunately, there’s evidence that focusing solely on a rigorous classroom observation cycle, rather than student growth measures, could be the solution.
In a recent piece for The 74, Matt Barnum examined research on teacher evaluation work in Chicago, including an analysis of a pilot system that focused solely on classroom observations and the system’s impact on the labor market. The analysis found that the first year of the pilot resulted in an 80 percent increase in the exit rate of the lowest-performing teachers; the teachers who replaced exiting educators proved to be higher performing than those who exited. Overall, the findings suggest that evaluation systems based solely on rigorous observations of teacher practice can impact the quality of the workforce. This type of system would also remedy the biggest problem with Ohio’s evaluation structure, which is that current student growth measures unfairly evaluate teachers in many subjects and grade levels.
The second and third problems with the current system—effectively differentiating teachers and offering better feedback—are also improved by zeroing-in on an improved observation cycle. In general, observations provide more detailed information about the complex job of teaching than a list of raw scores ever could. More information means more opportunities to pinpoint variances in performance, but only if the system uses a high-quality rubric and takes advantage of multiple perspectives by including outside observers and peer observers. Improving observer training and ensuring a mix of announced and unannounced observations is also important.
When it comes to offering better feedback, it’s widely acknowledged that teachers find evaluations most helpful when they're given actionable feedback on their practice. This type of feedback only comes from observation of practice. Plenty of other sectors understand this. Professional football teams prepare for their next opponent by studying game film. Players study their future opponents, but they also study their own performance from the previous game—the choices they made, what they could have done better, and what they need to continue doing. Teacher evaluations should offer the same opportunities. Teacher coaching, teacher collaboration (which research says can lead to student achievement gains), and peer reviews—all of which have been found to improve teacher practice—are only effective if they include rigorous observation of practice.
It’s true that assessment results are a form of feedback. But as a former teacher, I can attest to the fact that studying test results (value-added or otherwise) was hardly the most effective way for me to improve my pedagogy. I needed to know why my students did or didn’t do well, and that answer couldn’t be found on a data spreadsheet no matter how hard I looked. A far better use of my time and the best way to make me a better teacher faster would have been actionable feedback that came from observing my practice—which, after all, is what most impacted my student’s test scores in the first place.
In summary, research shows that evaluation systems based solely on rigorous observations of teacher practice can impact the quality of the teacher workforce. Research also shows that improving teacher practice can be done through observations conducted by well-trained observers using high-quality frameworks and rubrics. Taken together, it seems that one way to improve Ohio’s teacher evaluation structure is to pilot a system that focuses solely on rigorous classroom observations. Stay tuned for a detailed explanation of what such a system could look like.
[1] The 20 percent is made up of teachers whose scores are fully made up of value-added measures (6 percent) and teachers whose scores are partially made up of value-added measures (14 percent).