To improve educational practice, let researchers peek into the black box of the classroom

Michael J. Petrilli

12.5.2018

I’m in the middle of a series of posts looking at how we might usher in a “Golden Age of Educational Practice” now that big new policy initiatives appear to be on ice. Last week I claimed that all of the possibilities that might work at scale entail various investments in innovation and R&D. Such efforts will only be successful, though, with exponentially better insight into what’s actually happening in the classroom.

That’s because, right now, key decision-makers are flying blind. Consider just a few examples of questions that have been raised in recent weeks that we simply cannot answer:

Is student achievement flat because teachers are implementing Common Core, and it’s not working? Or is it flat because are teachers mostly ignoring Common Core? Or is it neither of the above? We have no idea.
Has “balanced literacy” served as a Trojan Horse that allowed whole-language reading instruction to continue unabated in our elementary schools, instead of a scientifically-based approach with a big emphasis on phonics and phonemic awareness? Is this an issue in relatively few schools or lots of schools? We have no idea.
Are most high schools teaching a Howard Zinn–inspired version of U.S. history, with an overwhelming focus on our country’s injustices, as opposed to its triumphs, too? Or is this just happening in deep-blue bubbles? We have no idea.

And it’s not just policy wonks or education scholars that lack information; leaders at state and local levels have too little insight into classroom practice as well. Whereas the world outside of our schools has been transformed by information technology, the data we collect on classroom practices is somewhere between nonexistent and laughably rudimentary. In other words, we know almost nothing about almost everything that matters.

To be sure, education research improved dramatically starting in the early 2000s with the creation of the Institute of Education Sciences, the federal mandate for annual tests in grades three through eight, and the concurrent development of longitudinal data systems in most states. Scholars suddenly had the money and the data to examine a variety of educational interventions and their impact on student achievement, significantly increasing our understanding of what’s working to boost student outcomes.

Yet the vast majority of such studies rely on state “administrative data”—information that is collected to enable our systems to keep humming along, but that can also be happily recycled as markers of various inputs or programs whose effectiveness might be studied. Lots of this is related to teacher characteristics—their years of experience, race, training, and credentials. Other data captures bits of the student experience—their attendance patterns, course-taking habits, family background—and that of their peers.

This is all well and good but it’s still very limited. We end up studying the shadow of educational practice rather than the real thing. What we don’t see is what’s actually going on in the classroom—the day-to-day work of teachers and their students—the curriculum, the assignments, the marks students receive, the quality of instruction itself. We simply don’t know what kids do all day: the books they read, the tasks they’re asked to perform, the textbooks teachers use—or even whether they’re used at all or sit unopened in the closet, whether programs are implemented with fidelity, haphazardly, or not at all.

Examining practice has always been a difficult and expensive proposition. The most respected approach involves putting lots of trained observers—often graduate students—in the back of classrooms. There, they typically watch closely and code various aspects of teaching and learning, or collect video and spend innumerable hours coding it by hand. This is incredibly labor-intensive and costs gobs of money, so it is relatively rare.

Alternatives to observational studies are much less satisfying. The most common is to survey teachers about their classroom practices or curricula, as is done with the background questionnaires given to teachers as part of the National Assessment of Educational Progress (NAEP). Though useful, these types of surveys have big limitations, as they rely on teachers to be accurate reporters of their own practice—which is tough even with positive intentions. It’s also hard to know whom to survey about some information; for example, Morgan Polikoff, associate professor of K–12 policy at the University of Southern California, has been trying to understand which textbooks schools are using, and is finding that, in many districts, nobody can give him a straight answer.

So that’s the challenge: We lack the systems to collect detailed information about classroom practice that might help us learn what’s working and what’s not, and inform changes in direction at all levels of governance.

Thankfully there are potential solutions. I see three:

Take advantage of data already being collected by online learning providers and services, such as Google Classroom, to gain insights into our schools;
Systematically collect a sample of student assignments, complete with teacher feedback, to learn more about the “enacted curriculum,” its level of challenge, and its variation; and
Use video or audio recording technology in a small sample of schools to better understand instructional practice in America today.

The first possibility is a cousin of using administrative data to power research studies. Online learning platforms like Khan Academy and services like Google Classroom are already collecting reams of data about teaching and learning, but to my knowledge, these data remain largely proprietary and locked away. Surely it would be possible to protect student privacy and any trade secrets while allowing researchers to gain insights into what’s working in our schools.

Google Classroom seems particularly promising, given that, by some accounts, more than two-thirds of districts use it today. Imagine if scholars could view, anonymously, student essays and other assignments. With the help of machine learning, we could figure out how much variation there is in the level of challenge of the assignments and in the grading standards. And we could glimpse if schools with tougher assignments and higher grading standards were getting better results in terms of student learning, after controlling for background factors. We might also be able to tell which curriculum a given teacher or school was using and the degree of alignment between student assignments and grade-level standards.

This approach would be particularly useful for middle and high schools, given that many assignments are now completed online. But what about elementary schools, where paper-and-pencil worksheets still largely rule the roost? That brings us to our second big idea. Imagine if we could identify a nationally representative sample of elementary schools where researchers would collect a sample of student work on a regular basis—worksheets, quizzes, tests, etc. The research initiative would develop an easy-to-use mechanism for digitizing these materials, adding value to the teacher and the school. For example, a scanner could be provided that makes copies of marked-up worksheets and quizzes, automatically enters the grades into teachers’ electronic grade-books, puts an electronic copy in a students’ online portfolio, and sends the image to parents’ emails. Meanwhile the information is sent to the researchers, connected securely to each students’ profile, and anonymized. (Xerox’s XEAMS initiative experimented with some aspects of this; Class Dojo has some of these functionalities, too, with students scanning their work with iPads.)

Just as with Google Classroom, we’d make a quantum leap in our understanding about the curriculum our schools are using, the level of rigor in student assignments, teachers’ grading standards, and much more.

The third big idea, and also the most controversial, is to record what’s happening in a sample of our classrooms. (Audio is less intrusive than video, and gives you just as much information.) Using a smart speaker like Amazon’s Echo or a Google Home Mini, researchers could capture the play-by-play of instructional practice and then train algorithms to make sense of what’s going on. As I explained in my Education Next column earlier this year, this is no longer the stuff of science fiction. Researchers are already doing this to glean insights into the kind of questions teachers are asking—and which approaches work best to drive student engagement and learning.

They start by capturing high-quality audio, and then run the audio files through several speech-recognition programs, producing a transcript. Then their algorithm goes to work, looking at both the transcript and the audio files (which have markers for intonation, tempo, and more) to match codes provided by human observers.

The computer program has gotten quite good at detecting different types of activities—lectures versus group discussion versus seatwork, for example—and is starting to be able to also differentiate between good and bad questions. Humans are still more reliable coders, especially for ambiguous cases. But the computers are getting better and better, and good enough that, with sufficient data, they can already produce some very reliable findings at a fraction of the cost of a people-powered study.

Connect all of this up to state administrative data and student achievement data and we would finally have an accurate picture of what’s actually going on in U.S. schools. (We’d know, for example, the degree to which schools are narrowing the curriculum and squeezing out science and social studies.) And we’d have vastly more information with which to study the effectiveness of various instructional and curricular approaches.

Big hurdles remain, to be sure. The biggest aren’t technological, but political: Such an effort must earn the trust of teachers and parents. We must be able to promise that none of the data will be used to evaluate or punish teachers; it must also be protected with the highest level of data security. None of that would be easy, but by allowing schools to opt-in, and by starting with a small pilot, such an initiative might earn the trust of key stakeholders over time.

To be clear, this is a different idea than putting a camera or microphone in every classroom—akin to body cameras or dashboard cameras for police. I’ve written about that notion too, and see potential value in it, but it raises Orwellian questions that are a whole order of magnitude higher.

The goal here is to accelerate the R&D process by improving our ability to learn about instruction. For that purpose, a relatively small number of schools or classrooms—in communities that volunteer to participate—would do the trick.

***

To be clear, collecting better information about classroom practice is just one part of the puzzle. We also need to fund rigorous studies to turn the data into insights about what’s working, and then have to figure out how to get evidence-based practices into the schools. I will tackle all of that in future posts. But without good data, any R&D strategy is destined to fail. It’s a critical foundation, and one that is not nearly sturdy enough today.

Tags:

Non-breaking space

Common Core

National Assessment of Educational Progress

Institute of Education Sciences

Google