Two years ago, students at a charter school in East Los Angeles were learning at 1.5 to two times the pace of their grade level peers around the state, based on three years of standardized test scores. But the California Department of Education labeled the school a “low performer,” which put it at risk of closure. Why? Because California’s method of measuring academic growth was deeply flawed.
I have written before in these pages about the importance of accurate and balanced methods of measuring school quality. In the same spirit, I recommend a new book by Steve Rees and Jill Wynns, Mismeasuring Schools’ Vital Signs: How to Avoid Misunderstanding, Misinterpreting, and Distorting Data.
Wynns spent twenty-four years on the San Francisco school board, while Rees spent just as long running a company that helped school districts measure and report on the quality of their schools. Both have seen their share of mistakes, many of which lead to real pain: teachers reassigned and principals removed based on faulty data; English learners held back from entering the mainstream academic program even after they have become fluent; charter schools closed due to inadequate measurement of growth; even students denied graduation based on flawed interpretation of test results.
Rees and Wynns have now authored a highly readable guide that superintendents, principals, school board members, education reporters, teachers, and advocates can use to avoid these kinds of errors. They underline four flaws that are most common:
Growth versus proficiency
The first is using children’s current test scores—rather than a measure of their academic growth—to judge the quality of schools and teachers. In high poverty schools, students often arrive several years behind grade level. Few of them are “proficient” in math or reading. But too often states and districts give greatest weight to students’ current test scores, not their rate of improvement.
Consider a middle school whose sixth grade students arrived three years behind grade level. If they are only one year behind grade level at the end of sixth grade, that would be spectacular progress. But in California, to use but one example, the school’s academic score would be in one of the two lowest categories.
Apples versus oranges
The second major flaw Rees and Wynns point out is related: When trying to measure academic growth, some states and districts fail to measure the same students over time. Instead, they measure a school or grade level’s average over time. But in a middle school, a third of the students each year are new arrivals, and another third from last year have departed. In four-year high schools, a quarter leave each year, another quarter arrive. So annual school or grade level averages are measuring different kids.
The solution is obvious: Measure the same cohort of students over time, following them from one grade level to the next. Even better, remove from your measure students who have departed or recently arrived at the school.
Ignoring the imprecision of test results
The third common flaw is failure to acknowledge the imprecision of test scores. “When we test kids, we’re trying to gather evidence of something that exists out of sight, somewhere between their ears,” Rees and Wynns write. “Whatever their test scores reveal, it can only be an estimate of what they know.”
Standardized tests are often used to rate children—typically into four categories, which might be summarized as advanced, proficient, needing improvement, and far behind grade level. But imprecision means some of these classifications are dubious. “The major test publishers include what they call classification error rates in their technical manuals,” the authors explain. “It is common to find a 25–30 percent classification error rate in the middle bands of a range of test scores—and that’s for a standardized assessment with forty-five to sixty-five questions.”
“In Texas, Illinois, Maryland, California, Ohio, Indiana, Florida, and many other states,” they add, “the parent reports make no mention of imprecision.” Yet these reports tell parents whether a child is on grade level. Some states use a standardized test called the Smarter Balance Assessment. Its “technical manual reveals that the classification accuracy rate in these middle two bands (Levels 2 and 3) is about 70 percent. In other words, just seven out of every ten kids whose scores land in the middle two bands will be classified correctly as having either met the standard or scored below the standard.”
Lack of context
The fourth major flaw Rees and Wynns discuss is “’disregarding context when analyzing gaps in achievement.” Often, a school is compared to the statewide average, when its students are anything but average. They might be affluent, or poor, or recent immigrants. If so, do we learn anything about the quality of their school by comparing them to a state average?
Rees and Wynns urge school and district leaders to compare their students to schools or districts with demographically similar children. “If you can identify other schools with kids very much like your own who are enjoying success where your students are lagging, you can call the site or district leaders and see how their approach to teaching reading differs from your own,” they suggest. “That last step, compare-and-contrast with colleagues who are teaching students very similar to your own, is where your analytic investment will pay off.”
The authors point a finger of blame at schools of education, which rarely teach future teachers or administrators about data, assessment, or statistics. “Schools of education simply must stop sending data- and assessment-illiterate educators into the field,” they declare.
They also urge state departments of education to disclose the imprecision of test scores whenever they report results, to do more to communicate the meaning of those results, and to create help desks that district and school leaders can turn to with data and assessment questions.
Perhaps their most novel recommendation is that we begin measuring “opportunities to learn,” to draw attention to yawning gaps. Some districts assign students to the school closest to their home, for instance, while others offer significant choices—hence greater opportunity. Most districts give teachers with seniority more ability to choose their schools, leaving the schools in low-income neighborhoods to settle for rookie teachers or those no one else wants—creating a huge opportunity gap for low-income students. Some schools offer the opportunity to take more advanced courses or more career-oriented courses.
A few districts work hard to match their supply of courses and schools to what students and their families want, but most don’t. The result: yet another opportunity gap. “If 90 percent of your sections are dedicated to college-level course work, and 50 percent of your graduating seniors have chosen a path to the workforce or the military, then your master schedule constrains the opportunities to learn that your students care most about,” the book explains. “Work force prep courses and multiple pathways toward work-related professions would be a needed addition for that school. The question for those leading or governing districts is how actively you listen to students when they tell you what future they’re aiming for, and the extent to which you direct your budget and staff to meet their desires.”
A brief article cannot begin to suggest the depth and detail the authors plumb in this volume. In addition, every chapter of Mismeasuring Schools’ Vital Signs includes questions people can ask to uncover data and measurement problems—and methods to solve them—in their own districts and schools. There is even a website that extends the book, which includes interactive data visualizations and resources, such as a glossary of statistical terms and a “visual glossary” showing the types of charts and graphs you can use to communicate meaningful data. There’s an old saying in the management world: What gets measured gets done. As Rees and Wynns demonstrate, in public education we too often measure the wrong things in the wrong ways. If we’re going to improve the lives of children, we have to learn how to measure what matters, accurately, and then understand what it means. Mismeasuring Schools’ Vital Signs is a good place to start.
Editor’s note: This was first published in The 74.