The testing and accountability conundrum: Counting the cost of accountability
By Robert Pondiscio
By Robert Pondiscio
No reasonable adult would suggest eliminating the foster care system, or making it impossible for responsible government authorities to remove abused or neglected children from unsafe homes. On the one hand, no one wants to tell parents how to raise their kids. But on the other hand, when the health and well-being of a child is at risk we don’t shrug it off; we intervene. And not indiscriminately or capriciously: about 400,000 children are in foster care at a given moment—a tiny fraction of the nearly 75 million children under 18 in America. Removing children from their parents is a drastic measure and appropriately rare.
“Drastic and rare” is surely what my Fordham Institute colleague Mike Petrilli has in mind in wanting to ensure that no child gets stuck in a hopeless and godforsaken school. Mike and my friend Jason Bedrick, Director of Policy at EdChoice, have been exchanging friendly fire over school choice and accountability. Bedrick says choice is accountability. Schools ought to be “directly answerable to the people most affected by their performance,” or parents, not regulators and bureaucrats; I made a similar argument in a U.S. News column in December. Mike agrees, but also wants schools to be “subject to societal expectations regarding results.” He’d probably agree that we should regulate schools about the same way we regulate parents—lightly and rarely. But in both instances we need to reserve the right to intervene to protect kids in the most extreme cases. That’s just common sense. No one would reasonably object.
And in principle, neither do I. But follow me for a moment on a thought experiment. Those who enter foster care are overwhelmingly from poor homes and disproportionately children of color. Surely there are unsafe upscale homes as well. Suppose we wished to be more equitable and fair in deciding which children were truly at risk? So what if, instead of leaving it up to the judgment of social workers and others who might be acting on implicit or explicit bias, we decided to identify children potentially at risk largely on measurable data, particularly emergency room visits? If your child ends up in urgent care more often than average, that’s a signal that something might be amiss. You should expect some extra official scrutiny. But rest assured no one’s looking to take your children away. That should worry only the “very worst” parents.
Are you reassured? Or would you make some adjustments to your family’s lifestyle and routines? Would you be more inclined or less inclined to let your children play sports? Would outdoor activities like hiking in the woods and playing in the waves at the beach seem a little risky and less appealing? Forget hunting, riding ATVs, using power tools in dad’s workshop, or that two-week wilderness trip with the Boy Scouts. Maybe it changes your feeling about unsupervised play? Hmm, maybe not; accidents happen. Come to think of it, you might even encourage your children to stay inside more. All it takes is a couple of trips to the emergency room and your family might be at risk of being broken up. Why take a chance? A small but non-trivial risk of being labeled “one of a handful of crappy parents” and having your child removed from your home might have a significant impact on your parenting.
And that’s what I think we’re overlooking when we use test scores to judge schools. It’s not the three percent of “truly dismal schools” that should concern us. It’s the 97 percent of other schools that alter their practices to avoid being perceived as one of those dismal schools that should concern us. It’s curriculum narrowing, over-reliance on test prep, practice exams, and myriad other decisions large and small that cumulatively alter our children’s experience of school—and not always for the better.
No reasonable person objects to the foster care system because, broadly speaking, it seems appropriately focused. Very few of us worry about our children being removed from our homes. By marked contrast, school accountability regimes may be intended to weed out only the “truly dismal,” but they force all schools to demonstrate they aren’t one of those dismal schools. So they begin to behave in ways they otherwise wouldn’t—including adopting instructional practices and school culture habits we might not want.
I visited a school not long ago that is, by all accounts, strong and rapidly improving. My hosts were deeply proud of the school and not without cause. In every classroom I visited, I noticed the same poster encouraging kids to “Keep calm and score basic or above.” I observed a math class where the teacher repeatedly reminded students as they worked to “think about what they want to see.” Several times she asked, “How would you prove it to them?” The “they” and “them” referred to those scoring the state tests, even though such tests were still many months away. Please don’t blame the teachers or school administrators. They were responding appropriately and sensibly to “societal expectations regarding results.”
I’m morally inclined toward Bedrick’s “choice purist” argument for its simplicity and clarity. I chose my daughter’s (private) school without much official oversight, approval, or fear of sanction. I see no reason to think I’ve cornered the market on sound parental judgment. That said, Petrilli and others who favor stronger oversight are on solid ground when they note that, when taxpayers are paying for it, the public has a right, even an obligation, to make sure the money’s not squandered. But where I part company with them is that I’m increasingly open to exploring other forms of accountability, including letting parents vote with their feet.
The bottom line: I’m pro-testing. I’m pro-accountability. It’s test-driven accountability I’m not so sure about. No one wants crappy schools to get a pass. But let’s not overlook the testing tail wagging the schooling dog, or say it’s definitely worth it to identify the worst schools. Maybe it’s time for some new ideas.
Editor’s note: This article was originally published, in a slightly different form, by Education Next.
My friend and colleague Robert Pondiscio has done his best—and that’s a high bar—to navigate and mediate between Jason Bedrick and Mike Petrilli in the latest chapter of America’s endless debate about whether school accountability is best done via the parent marketplace or state assessment regimes.
The argument is important—and Robert strives to find a middle ground: he’s for testing, he’s for choice, he’s for accountability—but he ends up closer to Bedrick in expressing reservations about “test-based accountability.” And he does this via a far-fetched analogy to foster care, asking whether parents would bar their kids from soccer and gymnastics if the big, bad state were to determine that foster placements will be based on the frequency of children’s visits to hospital emergency rooms. Robert frets that just as many parents would change their family behavior in undesirable ways so as to keep kids out of the ER for fear of losing them to the foster-care system, so schools alter their behavior in educationally undesirable ways when test scores determine their fate within the accountability system.
He’s got a point, of course. We’re all mindful of the downside of excessive reliance on test scores as criteria by which schools (and teachers) get evaluated, sanctioned and rewarded. It inevitably affects what happens in the classroom. Robert has been visiting a lot of schools lately, so he’s more sensitive than ivory tower types to the alterations in classroom priorities and practices that follow from such reliance.
But just as there are downsides to test-driven school accountability, there are shortcomings in the parent marketplace. In my experience, it fails to secure the public interest in effective schools that adequately educate the next generation of the public’s children—and do so at public expense.
What else might we do? Robert says it’s time for those who want schools to be accountable but don’t want their fate to hinge on test scores to come up with “some new ideas.”
But where are those ideas to come from?
Let’s take a giant leap and assume that essentially all families do have school choices, i.e., that there’s a true education marketplace. That’s coming closer to reality every year and may come closer faster if some version of a Trump plan to enhance school choice with federal help becomes a reality.
Then let’s take another giant leap and assume that the summative assessments that states administer to their public-school pupils are good tests that truly probe a reasonable sample of what society wants its children to learn; tests that are, in that sense, worth teaching to. That’s still a dream, yes, but getting closer as more states embrace better assessments.
Then what else?
One approach is to add more factors to judgments about school performance, much as ESSA envisions when it admonishes states to employ graduation rates and “at least one indicator of school quality or student success” along with test scores. States are now weighing student and teacher attendance, parent surveys, the incidence of pupil suspensions and sundry other factors. And since the law says at least one, an imaginative state could deploy multiple factors. We do, however, need to keep in mind that whenever high stakes are attached to any metric, those affected will find ways to manipulate that metric to their advantage. Which may or may not also be to the advantage of children and taxpayers.
The other approach—the only other one that my brain can conjure—is to create some sort of “school inspectorate” that sends competent observers into schools for direct observation of their workings. Trained and experienced observers who bring carefully designed criteria and rubrics with them, and who are themselves subject to various checks on their consistency, reliability, and comparability.
I see many pluses in such an inspectorate. It’s akin to what a top-notch accreditation system does—though we have so few of those that many have never seen one in action—and it’s what a top-notch charter-school authorizer does for the schools in its portfolio. (We don’t have nearly enough of those, either!) But let’s admit that an inspectorate system would be expensive. If undertaken by states, it would be seen as a threat to local control. If done by the federal government, of course, there’d be hell to pay. Whoever runs it, it will consume time and resources on the part of schools being inspected, much as an accreditor visit does; and it’s ultimately subjective, meaning that when its outcome is used for high-stakes decisions there will inevitably be appeals, protests, and probably litigation.
In the end, an inspection system might actually prove more useful by way of formative, school-improvement feedback—and the shape-up benefit of having someone watch over your shoulder—than in passing judgments on school effectiveness for accountability purposes. We also need to recognize that any inspectorate equipped with criteria and rubrics is apt to have a standardizing, homogenizing effect on schools and may thereby limit the diversity of the education marketplace that was part of the rationale for school choice in the first place.
Sure, like Robert, I’d welcome more “new ideas.” But I don’t have any more. While waiting for cleverer folks to devise some, I still see more pluses than minuses in relying fairly heavily on test results—the results of good tests, that is—and the various analyses that they lend themselves to. School choice for everyone is important, too, but the public has a legitimate interest in ensuring that they’re sound choices.
On this week's podcast, Checker Finn, Alyssa Schwenk, and Brandon Wright discuss the ongoing debate about whether school accountability is best done via the parent marketplace or state assessments. During the Research Minute, Amber Northern examines whether struggling students are more likely to leave charter schools than traditional public schools.
Marcus A. Winters et al., “Are low-performing students more likely to exit charter schools? Evidence from New York City and Denver, Colorado,” Economics of Education Review (February 2017)
A new study in the AERA Open journal examines how to support low-level readers when they tackle complex texts.
Recall that the Common Core English Language Arts standards require that students read grade-level texts. Common Core’s Appendix A states that, “The expectation that scaffolding will occur with particularly challenging texts is built into the Standards’ grade-by-grade text complexity expectations…” “Scaffolds” are instructional supports that help (often struggling) students move to greater understanding and independence. Yet many teachers don’t know what it means to scaffold their instruction, much less how to deploy it well in their classrooms. This study examines the effectiveness of different types of scaffolding to support reading comprehension, in particular “interactional scaffolding”—essentially “in-the-moment” or “on-the-spot” responsive support based on a student’s immediate needs. (This contrasts with “planned scaffolding,” which is more static and often provided by tools and curriculum.)
Though the study addresses an important topic, it is limited in scope, based on just 215 mostly low-income fifth and sixth graders in four urban middle schools. Two-thirds of them scored at the basic or below basic level on their most recent state reading assessments. Scaffolding was provided during four thirty-minute guided reading lessons (i.e., two hours total intervention) by tutors working with small groups. Certified teachers served as tutors for most of the lessons and received three hours of training on how to identify and ultimately deliver the scaffolding techniques.
Five types of scaffolds were included to determine which, if any, supported reading comprehension gains: vocabulary (e.g., “What clues can help us with this word?”); fluency (e.g., “Put your finger under the text as you read”); comprehension (e.g., “What is happening in this part of the story?”); peer (e.g., “Ask two students to read together”); and motivational (e.g., “Give time limits in the form of races to accomplish tasks.”). Students were randomly assigned to small groups for the intervention, and various statistical techniques were put in place to control for how the student data are “nested” within schools, classrooms, intervention groups, and tutors. Researchers measured effects via the silent reading comprehension outcome measure—which measures comprehension of increasingly difficult sentences. .
The key finding is that scaffolds of the motivational variety were the only significant predictor of score increases, producing a 0.73-point bump every time a student received one in any of the four sessions. They also explained about 2 percent of variation in the reading assessment scores. To which the analysts respond: “Although this number may be low, recall that the intervention was only two hours of instruction and the motivational scaffolding only a small part of that instruction.”
In short, this modest study makes a modest contribution to our understanding of “scaffolding”—a word that teachers have learned much more about since the Common Core came onto the scene. That said, motivational scaffolds seem to have less to do with supporting explicit reading comprehension and more to do with engaging students by tapping into their love of games and contests. Indeed, it reinforces what most teachers already know: Kids, like adults, love to compete.
SOURCE: Dan Reynolds and Amanda Goodwin, “Supporting Students Reading Complex Texts: Evidence for Motivational Scaffolding,” AERA Open (October–December 2016).
A recent working paper from the National Bureau of Economic Research, written by Joshua Goodman of the Harvard Kennedy School, explores the effect of increased high school math requirements on students’ educational and workforce outcomes.
The study examines state-level school reforms enacted in response to A Nation at Risk, which, inter alia, lamented the declining status of the U.S. scientific and technological workforce. The federal report, published in 1983, prompted thirty-nine states and D.C. to increase the number of math courses that they required for high school graduation. However, the reforms were not implemented simultaneously (the most responsive updated requirements by 1984, while the slowest took until 1990). The differential timing of reforms across the states helped Goodman clearly identify their effects.
By looking at transcript data, Goodman found that when states boosted their math requirements, a jump followed in the average number of math courses completed. That’s not surprising. What’s more interesting—and sobering—is that this increase occurred because students took more basic math courses (e.g., algebra, geometry, vocational math), not more advanced math courses (e.g., algebra II, pre-calculus, statistics). Unfortunately, that means the reform did not achieve the Excellence Commission’s implicit objective of sharpening STEM’s cutting edge in the United States.
Still, Goodman contends that this reform did improve outcomes for some students. Specifically, it caused a statistically significant increase in math course completion for black high school graduates. For the class of 1987, in states that had already increased their requirements in response to A Nation at Risk, black students who graduated high school averaged 3.2 years of math, significantly higher than the average of 2.8 courses for their peers in states that had yet to enact reform (the trend was similarly upwards for white students, but less clear). The difference in mathematical experience for these two groups is critical to Goodman’s subsequent economic argument.
When the author compared the earnings of these graduates (using 2000 Census data), he found that those who had taken the additional math earned more money ($635 per year, on average). Importantly, the extra math requirement did not change the dropout rate, nor did it change the rate of college matriculation. Rather, this 3 percent pay raise was related to a shift towards slightly more cognitively intensive jobs for the middle-of-the-pack students who were most affected by the reforms.
Note, however, that this was not a longitudinal study, so the students in the transcript data set are not necessarily the adults whose earnings were measured, although Goodman argues that the two data sets align closely. He admits that data are lacking for other racial groups that might be compared, so more research is needed. Furthermore, the study only examines the quantity, not the quality, of math courses taken.
Goodman’s findings are a testament to the fact that not all policies produce the intended outcomes. However, with good research, we can identify the outcomes they do produce, and use that information to guide future reforms. Policymakers looking to close racial achievement and income gaps can find hope in this work, as Goodman clearly shows that requiring more math courses can help, while those interested in pushing high achievers to even greater heights should look to reform more than just the minimum graduation requirements. Yet more research is needed in both these areas. For now though, if your lawmaker, fellow wonk, or teenage child, ever questions the value of required math, point them here.
SOURCE: Joshua Goodman, “The Labor of Division: Returns to Compulsory High School Math Coursework”, National Bureau of Economic Research, January 2017.
Do incentives nudge students to exert more effort in their schoolwork? A recent study by University of Chicago analysts suggests they do, though the structure of the incentive is also important.
The researchers conducted field experiments from 2009 to 2011 in three high-poverty areas, including Chicago Public Schools and two nearby districts, with nearly 6,000 student participants in grades two through ten. Based on improved performance relative to a baseline score on a low-stakes reading or math assessment (not the state exam), various incentives were offered to different groups of pupils, such as a $10 or $20 reward, or a trophy worth about $3 along with public recognition of their accomplishment. The analysts offered no reward to students in a control group. To test whether pupils responded differently to immediate versus delayed incentives, some of the students received their reward right after the test—results were computed on the spot—while others knew the reward would be withheld for one month.
Several interesting findings emerged. First, the larger cash reward ($20) led to positive effects on test performance, while the smaller reward had no impact ($10). This suggests that, if offering a monetary reward, larger payouts will likely lead to more student effort. Second, the $3 trophy and public recognition also had a positive impact on achievement, though not as big of an effect as the $20 incentive. In addition to being cost-effective, this finding is important because in practice, non-cash incentives may be more acceptable in school environments. As the study authors note, “Schools tend to be more comfortable rewarding students with trophies, certificates, and prizes.” Third, incentives that were withheld from students for a month after the test did not improve performance. This suggests that sooner, rather than later, disbursement is an important feature of an effective incentive structure.
It is possible that external incentives could “crowd out” intrinsic motivation—students may be less likely to work hard once an incentive is removed. The authors find no evidence of this when examining treated students’ low-stakes test scores after the incentives ceased. Instead, they conclude that incentives, when structured well, can help motivate students to put in just a little more effort.
SOURCE: Steven D. Levitt, John A. List, Susanne Neckermann, and Sally Sadoff, “The Behavioralist Goes to School: Leveraging Behavioral Economics to Improve Educational Performance,” American Economic Journal: Economic Policy (November 2016).