Many school districts use teacher rating scales to identify students for advanced (i.e., gifted) programming, such as supplementary instruction and separate classes or schools. On these scales, teachers opine on the likelihood that a child is advanced by assigning a value—such as, on one prominent option, a number on a scale of 1 to 9. But how fair is this process? In a recent working paper, a team of analysts from the University of Connecticut and NWEA examined the variation in teacher ratings and the prevalence and potential consequences of “rater dependence”—that is, the degree to which ratings depend on teachers, as opposed to students.
To investigate this issue, the researchers used student-level data from four unidentified but geographically diverse school districts that were legally required to identify and serve advanced students. For each district, the data set included information on math and reading achievement test scores, cognitive ability tests, demographics, and complete and comprehensive teacher ratings for at least one grade level. Each district used a different scale for its teacher rating scales: one used the Gifted Rating Scales, one used the HOPE Scale, and two used their own scales. Employing a random effects model, the researchers analyzed the variation in teachers’ ratings of students within each district while controlling for students’ cognitive ability, achievement, and demographics.
The findings are a mix of good and bad news. As for the former, researchers find that teachers do assign higher ratings to students with higher scores on achievement and cognitive ability tests, as one would hope. They also did not find any consistent relationships between teachers’ ratings and students’ race and ethnicity once achievement and other variables were taken into account. In other words, teachers did not display any statistically significant bias based on race or ethnicity when identifying students.
On the bad-news side, analysts found that 10 to 25 percent of variance in teachers’ ratings of students is determined by their individual teacher, not anything inherent in the child. Picture two students in different classes taught by different teachers in the same school. They score the exact same on their cognitive ability and achievement tests, yet receive very different teacher ratings. One student is identified as likely advanced and the other is not.
This variation between educators at individual schools calls into question the usefulness of teacher ratings as an element in identifying students for services. On the one hand, it’s possible that the problem is caused by a dearth of high-quality training and careful implementation among teachers, hence some murkiness on how and whom to identify as an advanced student, for example. On the other hand, maybe the ratings themselves are inherently flawed and doomed to teacher error and bias. This suggests that, where administrators insist on using educator ratings for identification, they ought to be used exclusively as a means to include students who aren’t otherwise identified as advanced based on test scores; no students with sufficiently high scores should be denied advanced programming because of a low teacher rating. And educators should be effectively trained on how to use these systems. Another viable option is to not use them at all.
SOURCE: D. Betsy McCoach et al., “How Much Teacher Is in Teacher Rating Scales?,” Annenberg Institute (August 2023).