Traditional classroom observations are time and labor intensive, as they are meant to capture adequately the many nuances of student-teacher interactions and thereby inform future practice. A recent working paper, termed a proof-of-concept study by its authors, explores how audio recordings and automated analyses might supplement (or even replace) traditional in-person observation conducted by a principal or outside evaluator. If successful, this innovation could increase the number of evaluations that are possible, reduce the time and effort involved, eliminate possible evaluator bias, and expand the scope of items reviewed.
The paper’s authors, Jing Liu and Julie Cohen, are alumni of Fordham’s Emerging Education Policy Scholars program and teach at the Universities of Maryland and Virginia, respectively. They utilize transcribed videos of fourth- and fifth-grade English language arts classrooms collected as part of the Measures of Effective Teaching (MET) project, to date the largest research project in the United States on K–12 teacher effectiveness. More than 2,500 fourth- through ninth-grade teachers in over 300 schools across six districts participated in the MET project over a two-year span (academic years 2009–2010 and 2010–2011). The MET project’s sample was composed mainly of high-poverty, urban schools. Liu and Cohen focused on the first thirty minutes of nearly 1,000 videos (four per teacher)—amounting to 30,000 minutes of ELA teaching—from the first year of MET. A professional transcription company did a word-for-word transcription with time stamps attached to the beginning and end of each speaker’s turn. It also labeled different students speaking, as much as the audio quality allowed. The analysts also have value-added scores from state achievement tests and the SAT-9 and observational data from three of the most popular observation instruments.
First, they generated a roster of teacher practices that could be automatically coded, often at a finer grain size than in-person observational data, such as allocation of time between teachers and students taking turns to speak, open-ended and non-open-ended questions, and use of personal pronouns like “I” or “you” to identify where the most attention is being placed at any given point in the lesson. Descriptively, they determine that teachers spent 85 percent of the observation time talking to their students, with classrooms varying considerably in the prevalence of back-and-forth conversation, although the average was 4.5 turns per minute. Next, they analyze the psychometric properties of these various practices and home in on three promising constructs: classroom management (time spent on noninstructional things, such as getting into groups, taking roll, and managing disruptions versus instruction); interactive instruction (open-ended questions and back-and-forth discussion); and teacher-centered instruction (in which teachers are basically lecturing to a silent student body). After applying district and grade fixed effects and controlling for student characteristics—and, in some models, the teachers’ observation scores—they find small but meaningful and consistent correlations between these transcribed factors and various related domains in classroom-observation data. Specifically, they find that audio coded as teacher-centered instruction is a consistent negative predictor of value-added scores, while interactive instruction predicts positive value-added scores (the classroom-management construct has negative correlations relative to value added, but they are not significant, possibly due to lack of power).
Lastly, the analysts did a back-of-the-envelope cost calculation that indicates the minimum cost savings is 54 percent from the transcription approach as compared to the traditional human-observer approach. Several obvious cost drivers—such as the upfront cost of installing audio and video recording equipment and the development of computer algorithms—are not included in the calculation. However, the pandemic pivot to large-scale remote teaching via video conference software likely means that many classrooms also contain the requisite infrastructure, unlike the specially designed rigs utilized by MET over a decade ago.
Liu and Cohen conclude that audio analysis of this type is not at a stage where it could completely replace in-person observations, especially as teachers need to trust and buy into it. That’s true, of course, but let’s not forget that AI voice technology is already out there with a “pedagogical fitbit” that analyzes classroom discourse patterns—in person or virtually—and sends the teacher feedback. What we need now is willingness to try new available technologies like these to improve teaching, not more reasons that we should stick to the status quo.
SOURCE: Jing Liu and Julie Cohen, “Measuring Teaching Practices at Scale: A Novel Application of Text-as-Data Methods,” Annenberg Institute (March 2021).