1. Problems with models
Nobel Physicist Richard Feynman wrote about models:
[Physicists have] learned to realize that whether they like a theory or they don’t like a theory is not the essential question.
Rather, it is whether or not the theory gives predictions that agree with experiment.
It is not a question of whether a theory is philosophically delightful, or easy to understand, or perfectly reasonable from the point of view of common sense.
But we’re human. We like simple: “Let’s raise standards or spend more on schools.” We like common sense: “Reduce class sizes.” We like delight: “Hooray for project based learning.” So these human reactions continue even when such K–12 policies don’t typically perform well in experimental settings.
2. COVID-19 medical models have not performed well
Joseph Epstein writes in the Wall Street Journal:
Some specialize in epidemiology, in immunology, in virology, in public health. On television they flood us with information, more, it sometimes seems, than we can handle. They talk about models, curves, numbers, percentages.
They tell us everything except what we want to know: how the coronavirus began, where it is headed, and when it will end.
The reason they cannot help, I gather, is that they do not really know. Epidemiology, immunology, public health, one begins to sense, may be in the same state—they know everything but what is crucial.
Thus far little predicted by the various scientific experts has come to pass. Not the number of deaths nor the duration of the virus, nor the time of a return to normal life. Yet when the talk turns to reopening the economy, many people, governors and mayors among them, say they await the word of science.
...One major casualty of the coronavirus may turn out to be to the prestige of science.
This failure of experts prompted Edward Dougherty, the chair of electrical engineering at Texas A&M, to explain this is a common problem with models:
Today, scientists are grappling with the problem of model uncertainty, as seen in areas like climate and medicine. These questions are increasingly challenging the basis of modern scientific knowledge itself, which is defined by a combination of mathematics and observation.
Modern scientific knowledge, while rejecting commonsense conceptual models, has always depended upon mathematically expressed theories that could be validated by prediction and observation.
But this approach is now under pressure from multiple sides, suggesting a deep crisis of scientific epistemology that has not been fully confronted. At the same time, political leaders find themselves increasingly impotent when faced with scientific issues.
As we move further into the twenty-first century, humankind is presented with an existential paradox: Man’s destiny is irrevocably tied to science, and yet knowledge of nature increasingly lies not only outside ordinary language but also outside the foundational epistemology of science itself.
So, folks, we K–12 people are in good company! We tend to admire epidemiologists and climatologists. But they’re bad at modeling, just like us.
Epidemiology, for example, is perhaps one-fourth science and three-fourths psychology of crowds. K–12 is perhaps one-fourth the idea, policy, or intervention and three-fourths human execution, with all its variance. (Also see here, here, here for other takes on the problems of experts in creating—and communicating about—their models).
3. Does K–12 have many, or few, evidence-proven practices?
Many of my K–12 friends stipulate that there are lots of anti-data dummies in our sector. “Not us but other people,” they say. Dummies advocate intervention where most of the evidence is actually negative.
However, these smart friends believe there are many other interventions that are backed by evidence. Why do these friends believe this? Jon Baron at Arnold Foundation nails it:
Someone reading academic journals or looking at web-based clearinghouses of evidence-based interventions might get the impression that there are actually a large number of social interventions shown effective through rigorous evaluations. As we have discussed in previous Straight Talk reports, however, most of these “evidence-based” interventions are backed by only preliminary or flawed findings, which, based on the long history of rigorous evaluations, are often reversed when a more definitive evaluation is subsequently carried out. (Emphasis added.)
Importantly, this problem is not just us in K–12. Baron says it appears in many disciplines:
Business: Of 13,000 RCTs conducted by Google and Microsoft to evaluate new products or strategies in recent years, 80 to 90 percent have reportedly found no significant effects.
Medicine: Reviews in different fields of medicine have found that 50 to 80 percent of positive results in initial clinical studies are overturned in subsequent, more definitive RCTs. Thus, even in cases where initial studies—such as comparison-group designs or small RCTs—show promise, the findings usually do not hold up in more rigorous testing.
Education: Of the 90 educational interventions (all with “good preliminary evidence”) evaluated in RCTs commissioned by the Institute of Education Sciences and reporting findings between 2002 and 2013, close to 90 percent were found to produce weak or no positive effects.
Employment/training: In Department of Labor-commissioned RCTs that reported findings between 1992 and 2013, about 75 percent of tested interventions were found to have found weak or no positive effects.
“Aha!” you say. “Well if 90 percent of K–12 interventions don’t work when tested in RCTs, that means 10 percent do! And so let’s do that 10 percent.”
Sadly, no. Because education is complex, there are two problems. One is that replication of RCTs often doesn’t work in new settings—early childhood education, for example, or high-dosage tutoring. The second problem is timing: Sometimes early gains on a proxy measure, like test scores, don’t translate to later gains on things we care more about, like college graduation rate or escaping poverty with higher earnings; this may be the case with no excuses charter schools, a topic I will explore in a future column.
I offer those particular three examples—early childhood education, high-dosage tutoring, and no excuses charters—because Randi Weingarten likes the first, Betsy Devos likes the third, and both are sort of lukewarm on the second. I’m trying to show that this evidence/model problem is true across the preferences of our political tribes, whether blue or red or purple, and that it’s true across disciplines, not just education.
4. So then what do we do about models?
Texas A&M’s Dougherty offers this:
Confronting the problems of complexity, validation, and model uncertainty, I have previously identified four options for moving ahead:
(1) dispense with modeling complex systems that cannot be validated;
(2) model complex systems and pretend they are validated;
(3) model complex systems, admit that the models are not validated, use them pragmatically where possible, and be extremely cautious when interpreting them;
(4) strive to develop a new and perhaps weaker scientific epistemology.
I think it’s safe to say that our ed policy sector does #2 mostly. We pretend! Pretending will continue.
Jon Baron encourages #3. I am in his tribe. But it does not appear to be a growing tribe, as best as I can tell. Humility doesn’t scale well.
Feynman nailed what people really want: policies that are delightful, simple, or common sense. And politicians give people what they want.
So now what? I will examine that in a Part 2. Stay tuned!