As school accountability systems reset following pandemic disruptions, an opportunity arises to improve their accuracy and make sure the intended responses to data resulting from them are properly tuned. A new study from the U.S. Department of Education’s Institute of Education Sciences looks at a way in which academic proficiency rate calculations for small schools and small subgroups of students might be sharpened and improved.
School-level data for each year from 2015–16 through 2018–19 come from the Pennsylvania Department of Education (PDE), and comprise all of the elementary, middle, and high schools in the state that were included in ESSA accountability calculations in those years. Which means we’re looking at schools examined for possible inclusion in Targeted Support and Improvement (TSI) and Additional Targeted Support and Improvement (ATSI) under state-adopted subgroup performance criteria. Many schools, however, had subgroup sizes below twenty. Those subgroups had traditionally been excluded from the TSI and ATSI calculations due to concerns over unreliability (termed “instability” by researchers) of data derived from such a small sample. The researchers theorized that there was a way to improve the proficiency rate calculation of smaller subgroups in these schools using Bayesian hierarchical modeling, a statistical method used to make inferences about a population based on a number of individual observations. (Lots of information on the model can be found here and here. It is used in fields as diverse as astronomy and marketing.) The eight subgroups examined in each school were Asian students, Black students, Hispanic students, White students, multiracial students, economically disadvantaged students, students with disabilities, and English learners. (Despite targeting this research on improving the measurement of very small subgroups, Native American/Alaska Native and Hawaiian/Pacific Islander student groups were simply too tiny to be included.)
Analysts looked at the percentage of students scoring at or above the state’s threshold for academic proficiency in each school-subgroup combination and the number of tested students in English language arts and math for each school-subgroup combination in each year, looking to stabilize the proficiency rate data for each. They found that their stabilized proficiency rates showed more consistent variation across subgroup sizes, indicating that they are more reliable than unstabilized rates. Thus, use of Bayesian stabilization could allow for inclusion of smaller subgroup sizes with similar statistical reliability as larger samples.
They then reran PDE’s data using a minimum subgroup size of ten to produce a new list of schools with low-performing student subgroups as determined by state criteria. The new calculations moved one subgroup above the proficiency cutoff for ATSI designation in nine out of the 193 schools originally identified. Specifically, the subgroup of White students in six schools and economically disadvantaged students in three other schools all were raised above ATSI designation. Their conclusion: The schools were misidentified and funding to support students in those subgroups in those schools was not necessary, diverting money from schools and students who needed it more. Small potatoes in one state, perhaps, but potentially much more important writ large.
The researchers note two limitations of their study. The first is a lack of student-level data, which they surmise could allow for even sharper stabilization of proficiency rates. Second is that most of the subgroups whose status changed were very close to the ATSI proficiency cutoff. PDE has discretion on borderline cases like these, and while the researchers set inflexible cutoffs, they assume that state officials would have simply kept the classifications as originally calculated, making their work moot in the real world.
Despite these limitations, the Pennsylvania Department of Education was encouraged by the results of this experiment and partnered with the researchers to incorporate data stabilization into the state’s actual ATSI calculations for 2022. No outcomes have been announced yet, but the researchers believe that not only will this correct for measurement errors in the past, but will also help smooth out data collection gaps experienced in 2020 and 2021 due to pandemic-induced testing disruptions. Surely many eyes will be trained on the Keystone State whenever results are released.
SOURCE: Lauren Forrow, Jennifer Starling, and Brian Gill, “Stabilizing Subgroup Proficiency Results to Improve the Identification of Low-Performing Schools,” Institute of Education Sciences, Regional Educational Laboratory Mid-Atlantic (February 2023).