Can classroom observations be used as the sole measure for identifying effective teachers? In a new study, Rachel Garrett (AIR) and former Fordham Emerging Education Policy Scholar Matthew P. Steinberg (University of Pennsylvania) attempt to answer this question by investigating the relationship between observation scores and student achievement.
They rely on the Measures of Effective Teaching (MET) study to extract data from a sample of 1,559 teachers of grades 4–8, who were randomly assigned to students in six major school districts. The sample was separated according to content area to determine teacher effectiveness in math and reading, with both sub-samples exhibiting similar student, teacher, and classroom characteristics. Analysts compared student performance (measured by test scores on state-mandated exams during the 2009–2010 and 2010–2011 school years), to teacher observation scores based on the Framework for Teaching (FFT) instrument, a widely used observation protocol. They measured both the expected and observed effects of teacher performance on student achievement.
If students were randomly assigned to teachers in both content areas, the researchers calculated, the expected growth of a student taught by a “proficient” teacher should be between 1.2 and 1.5 months of extra learning in math per year compared to a “basic” educator. But in English language arts, a “proficient” teacher should produce no more growth than a “basic” one—suggesting that, by observation score, the better-rated teacher is actually no more effective than her lesser-rated colleague.
In practice, however, the researchers observed “fairly extensive, post hoc, nonrandom sorting of students to teachers”—despite the intended randomness of the study. This “assortative matching was likely more limited than under a natural context”; but still, they posit, it probably caused the “observed” effects to be greater than the “expected” effects. And indeed, students assigned to “proficient” teachers gained 3.6 months and 3.4 months of learning in reading and math, respectively, compared to students taught by “basic” teachers.
The researchers conclude that the true effect of a “proficient” teacher compared to a “basic” one falls somewhere between the expected and observed measures. Nevertheless, the findings lead them to “question the wisdom of solely relying on observational protocols like the FFT for the purposes of evaluating teachers in a formal system that will affect decisions relating to tenure, performance pay, and other key personnel decisions.” Instructional quality can be enhanced or hindered by the composition of students in a teacher’s classroom, such that a significant amount of the growth a teacher produces could be attributed to the group of students she teacher—not necessarily what she’s doing in the classroom compared to her lower-rated peers.
My own experience as a first-year teacher supports this hypothesis. And although my school administration took into consideration my inexperience and the performance levels of my students when completing my evaluation, this may not be the case across all schools and districts. Let’s pay more critical attention to the use of classroom observations for high-stakes purposes and ensure that teachers are receiving the most accurate and just evaluations. Teachers should not be deemed “high-performing” for riding the coattails of their already brilliant students.
SOURCE: Rachel Garrett and Matthew P. Steinberg. “Examining Teacher Effectiveness Using Classroom Observation Scores: Evidence From the Randomization of Teachers to Students,” Education Evaluation and Policy Analysis (in press).