Assessing the evidence base for school discipline reform
By Matthew P. Steinberg and Johanna Lacoe
By Matthew P. Steinberg and Johanna Lacoe
The federal Office for Civil Rights announced this spring that the number of suspensions and expulsions in the nation’s public schools had dropped 20 percent between 2012 and 2014.
The news was welcomed by those who oppose the frequent use of suspensions and expulsions, known as “exclusionary discipline.” In recent years, many policymakers and educators have called for the adoption of alternative strategies that allow students to stay in school and not miss valuable learning time. Advocates for discipline reform contend that suspensions are meted out in a biased way because minority students receive a disproportionate share of them. Some also assert that reducing suspensions would improve school climate for all students.
In a recent Education Next article, we describe the prevailing critiques of exclusionary discipline, then examine the research base on which policy reform rests. We also describe the alternative approaches that are gaining traction in America’s schools and present the evidence on their efficacy. Throughout, we consider what we know (and don’t yet know) about the effect of reducing suspensions on a variety of important outcomes, such as school safety, climate, and achievement.
In general, we find relatively thin evidence for both critiques of exclusionary discipline and for support of alternative strategies. In part, this is because many reforms have only recently been implemented, so there’s not yet a lot of evidence as to their efficacy and side effects. Disparities in school discipline by race and disability status have been well documented, but evidence is inconclusive as to whether these disparate practices involve bias and discrimination. As for alternative strategies, the evidence is mainly correlational, suggesting that more research is necessary to uncover how those approaches affect school safety and student outcomes.
Addressing such questions is vitally important because a safe school climate is essential for student success, and disorder and violence in classrooms and corridors have adverse effects on all pupils. For example, students who were exposed to Hurricane Katrina evacuees with significant behavior problems experienced short-term increases in school absences and discipline problems themselves. Recent evidence also shows that exposure to disruptive peers during elementary school worsens student achievement and later life outcomes, including high school performance, college enrollment, and earnings. These findings highlight the importance of closely monitoring the effects of discipline reform on all students, not just those being punished.
Critiques of Exclusionary Discipline
Disproportionate suspension rates. There is little doubt that students of color face exclusionary discipline much more often than their peers do. Furthermore, gaps in suspension rates between black students and white students have grown over time, doubling between 1989 and 2010.
What accounts for these disparities? Do they stem from discrimination and racial bias? The possibility of such bias is one justification for the Office of Civil Rights’ involvement in the issue of school discipline. However, it could be that minority students are disciplined more often because they commit more infractions than their peers. If that is so, the greater frequency of violations among minority students could be caused by factors outside of the school’s purview, such as more exposure to poverty, crime, and life trauma resulting from residential and economic inequality.
Some evidence does suggest that racial minorities tend to be punished more severely than their peers for the same offenses. In 2011, Russell Skiba and colleagues analyzed school-level data on disciplinary referrals in 364 schools and found that black and Hispanic students were more likely than white students to receive suspensions or expulsions for “minor misbehavior,” such as inappropriate verbal language, minor physical contact, disruption, and defiance. Unfortunately, the study was unable to control for students’ prior infractions in school, a factor that may influence the severity of the response to a given offense. In a separate study, Skiba and Natasha Williams further revealed that black students in the same schools or districts were not engaged in levels of disruptive behavior that would warrant higher rates of exclusionary discipline than white peers.
Recent evidence from Arkansas confirms that black students attending public schools in the state are punished more harshly than their white peers, but also suggests that most of the difference is attributable to the schools that students attend. Researchers found that, over the course of three school years, black students received, on average, 0.5 more days of punishment (including in-school and out-of-school suspension and expulsion days), even when controlling for special-education status and comparing students at the same grade level. However, they showed that cross-school differences explained most of this aggregate difference; that is, when the researchers looked only at students attending the same school, the racial differences became much more modest, with black students receiving only about 0.07 more days of punishment than whites. Within schools, the authors also found a statistically significant, though modest, difference in the length of punishment for special-education students, approximately 0.10 days more per suspension.
Another recent study using nationally representative longitudinal survey data considered the role of prior problem behavior in disparate suspension rates. When the study authors controlled for whether these students exhibited prior behavioral problems (in kindergarten, first, and third grades), they found that the racial gap in eighth-grade suspension rates disappeared, leading them to conclude that the disproportionate use of suspensions was probably not the result of racial bias. This conclusion is subject to question, however, because the authors compared results from statistical models that relied on different underlying samples, owing to student attrition within the study. Further, the study was unable to address any biases implicit in the measure of prior behavioral problems.
Negative effects on school climate. Advocates of discipline reform contend that exclusionary discipline may have adverse consequences for school climate. While zero-tolerance policies aim to improve school climate and safety by removing disruptive students, research evidence finds that teachers and students in schools with high suspension rates report feeling less safe than their counterparts in schools serving similar students that have lower suspension rates. Schools with higher suspension rates also have greater teacher attrition and turnover. According to the American Psychological Association’s Zero Tolerance Task Force, there is no hard evidence that exclusionary policies reduce school violence.
While the evidence does suggest that school climate is worse when exclusionary discipline practices are more widespread, this evidence is not causal. We don’t know whether the use of exclusionary discipline causes school climates to deteriorate, or if administrators respond to unruly climates by clamping down on school discipline. Therefore, policymakers and practitioners must remain cautious about the potential effects that newly implemented reforms may have on school climate and student safety. And even if schools reduce their use of exclusionary practices, it doesn’t necessarily follow that they will cease to mete out these punishments disproportionately by race.
Negative effects on student outcomes. Critics also contend that exclusionary discipline can trigger a downward spiral in students’ lives inside and outside of school, leading to the so-called school-to-prison pipeline. Unfortunately, research on the causal effect of suspensions on academic achievement and other student outcomes is limited. Students who are removed from school do tend to have lower achievement on standardized exams; are less likely to pass state assessments; and are more likely to repeat a grade, drop out of school, and become involved in the juvenile justice system. A 2014 survey from the American Association of School Administrators found that 92 percent of superintendents believe that out-of-school suspensions are associated with negative student outcomes, including lost instructional time and increased disengagement, absenteeism, truancy, and dropout rates. These correlations, however, do not tell us whether suspended students would have experienced these adverse outcomes even if they hadn’t received suspensions.
Looking Ahead
Across the country, disciplinary programs and policies are trending away from exclusionary practices and toward a variety of alternatives, with the endorsement of federal and state governments. Yet with such thin evidence today regarding both the harm caused by suspensions and the potential benefits of other approaches, there’s a clear need for rigorous evaluation research, which should focus on the impact of school discipline reforms and their potential unintended consequences.
Children need a safe, secure learning environment if they are to thrive in school. Until we fully understand the benefits and costs of the various approaches to discipline, both exclusionary and alternative, we will fall short of providing that supportive climate.
Matthew P. Steinberg is assistant professor at the University of Pennsylvania’s Graduate School of Education. Johanna Lacoe is a researcher at Mathematica Policy Research.
This is an excerpt of a much longer article published by Education Next, which also explores the major critiques of exclusionary discipline and the thin evidence on their alternatives, such as restorative justice.
British prime minister Theresa May has set off a royal dust-up with her proposal to loosen England’s half-century-old shackles on grammar schools, the British term for selective-admission public secondary schools focused on preparation for university.
Back in the 1940’s and 50’s, England had a “tripartite” system of secondary education (not including old-line, private-pay prep schools like Eton and Harrow). Besides grammar schools for high achievers seeking an academic education, there were technical schools and “secondary modern schools.” Children were pointed down a particular track after taking the “eleven-plus” exam (around fifth grade).
This was typical of the era in many places and characteristic of class-riddled England. And of course it tended to perpetuate class divisions, as better-off kids with better-educated parents were much more apt to make it into (and want to enter) the grammar schools.
This arrangement began to change under Labour governments in the mid-sixties, as they pushed communities to create “comprehensive” secondary schools—akin to what James B. Conant, using the same adjective, urged for the United States, and what Ted Sizer would eventually dub the “shopping mall high school.”
Many policy chapters followed as Tory and Labour governments took turns changing priorities and ground rules but, by 1974, the British government was actively pushing for comprehensiveness in secondary schooling while also discouraging student selection. Communities that really wanted to could hold onto extant grammar schools, but the number of such schools gradually dwindled to today’s paltry 163 in a country with 3000 publicly-financed secondary schools. No new ones may open, and those that exist must get special permission from London if they want to expand.
All the other state-supported secondary schools are supposed to accept anyone who applies, though there are a handful of weird exceptions—and nowadays the majority of English secondary schools are quasi-charters (called “academies” or “free schools”), out from under the local education authority, answerable directly to the central government, and naturally subject to specializing and self-selection.
Like so much in British education, with its many policy layers, shifts, and special situations, the secondary-school picture is messy and complicated. But there’s always been a distinctive strand of public opinion and political support—nearly all of the latter being Tory—for bringing back selective public schooling, at least in places that want it. And there’s no doubt that the remaining grammar schools produce very solid academic results—though it’s hard to separate that from their admission practices.
Within weeks of becoming Prime Minister, Theresa May made clear that she wants more of them. Her plan is, to put it very mildly indeed, highly controversial—and the PM has already signaled that (a) the new policy will take steps to ensure that poor kids will have a fair shot at a grammar school education and (b) she doesn’t intend to impose selective schooling on every community.
Meanwhile, a London-based think-tank (chaired by former Liberal-Democratic MP and schools minister David Laws) has produced a report showing that, while attending grammar schools is beneficial for disadvantaged students—similar to “no-excuses charter schools” research in the U.S.—not many poor kids get to attend them today. Which doesn’t lead the analysts to concur with the PM that there should be more such opportunities. Rather, they declare that a sizable expansion of selective-admission schools would be bad policy, both because (obviously) the more kids who attend them, the less the measurable gains are apt to be, but also because—they say—as more kids attend them, there will be a learning loss among those who don’t, most of whom are still apt to be poor.
All this is, of course, redolent of American debates about peer effects and school selectivity, whether the intentional kind (“exam schools” for gifted students) or the more indirect kind that may arise from choice policies more utilized by better-off and better-organized families and may give rise to schools that, while nominally open admission, opt not to keep everyone who turns up on day one.
Where I come out: homogenized education isn’t good education. There are many ways to differentiate it, including academically oriented schools for high-ability and/or high-achieving students. I also understand, however, that the more it is diversified and tailored and the more choices that are made available to families and communities, the more debates will inevitably arise over issues of fairness.
Like the United States, England has a long way to go on both excellence and equity. They’ve been struggling with it for as long as we have been. And there’s no end in sight. But I salute Theresa May for her brave effort to get it right—and her gumption in proposing a change that she surely knew would ignite a bonfire.
On this week’s podcast, Mike Petrilli, Alyssa Schwenk, and David Griffith discuss how teachers ought to handle this year’s particularly polarizing and cringeworthy presidential election. During the research minute, Amber Northern explains how charter school boards affect school quality.
Juliet Squire and Allison Crean Davis, "Charter School Boards in the Nation's Capital," Thomas B. Fordham Institute (September 2016).
In this study, the authors examine the long-term impacts of publicly subsidized preschool and nurse home visitation in Denmark, using administrative data from preschools that began receiving public funding between 1930 and 1957.
Overall, they find that low-income Danish kids benefited from preschool access in several ways, and that some of those benefits were passed on to the next generation. However, for kids who also had nurse home visitation at birth, the positive effects of receiving pre-school were reduced by 80–90 percent. For example, Danish kids who had access preschool by age three were about 10 percent less likely to have only a compulsory education at age twenty-five—unless they also had nurse home visitation, in which case the impact of pre-school was only a fifth as large. Similarly, male students with access to preschool earned about 2 percent more as adults. However, those with nurse home visitation saw an earnings boost only a tenth as large.
This pattern suggests that preschool and nurse home visitation may have been substitutes rather than complements—at least in pre-World War II Denmark. However, the program’s design makes it difficult to know what to make of this finding.
To receive public funding, Danish preschools had to have staff with expertise on children, be open for at least four hours each working day, and provide services exclusively or predominantly to children from poor families. However, the Danish government also regulated and monitored preschools’ hygienic conditions and encouraged them to work with local physicians and dentists to monitor children’s health while reimbursing them for expenses related to these health check-ups. Thus, Danish preschools performed some of the same functions as the nurse home visitation program, in which trained nurses were assigned to visit newborns and their mothers approximately ten times during the first year of life so they could communicate the basics of infant care (“calmness, orderliness, and cleanliness”), monitor the babies’ development, and refer them to doctors if necessary.
Because of this overlap—that is, because Danish preschool had a modest public-health dimension—the meaning of the negative interaction between preschool and nurse home visitation is not so clear. Perhaps the benefits of early Danish preschools are primarily attributable to their public health dimension rather than their educational content. Or perhaps health and education interventions function as substitutes in the early years (at least when it comes to their long-term benefits).
Regardless of where the truth lies, the study raises important and under-examined questions about how preschool and other programs (like nurse home visitation) do or don’t interact, and whether the enrollment approach taken by current programs is as efficient as it might be.
As the authors observe, “In a world with limited public resources, it may be efficient to design programs that specifically target populations without prior exposure to other interventions. For instance, while many over-subscribed programs for low-income children allocate slots at random or on a ‘first-come, first-serve’ basis, our evidence suggests that an allocation mechanism that considers (the lack of) participation in earlier programs as potentially leading to greater program benefits.”
Given the billions of dollars the U.S. already spends on (potentially overlapping) programs like Head Start and Early Head Start, the continuing debate over whether (and for whom) they are effective, and the ongoing push for universal pre-K, this study’s hypothesis is worth testing. Unfortunately, it will probably be awhile before it is rejected or confirmed.
SOURCE: Maya Rossin-Slater and Miriam Wüst, "What is the Added Value of Preschool? Long-term Impacts and Interactions with a Health Intervention," NBER (September 2016).
A multitude of research has shown that quality teaching is necessary for student achievement and positive labor market outcomes. Rigorous evaluations have been hailed as a way to improve the teacher workforce by recognizing and rewarding excellence, providing detailed and ongoing feedback to improve practice, and identifying low-performers who should be let go. While plenty of time has been devoted to how best to provide teachers with feedback, less time has been spent examining how evaluation systems contribute to the removal of underperforming teachers and the resulting changes in the teacher workforce.
This study examines the Excellence in Teaching Project (EITP), a teacher evaluation system piloted in Chicago Public Schools (CPS) in 2008. The program focuses solely on classroom observations and uses Charlotte Danielson’s Framework for Teaching (FFT) as the basis for evaluation (unlike many current systems, which rely on multiple measures including student test scores). Roughly 9 percent of all CPS elementary teachers participated in the first year of the pilot, which was considered a “low-stakes intervention” since scores on the FFT rubric were not officially included on teachers’ summative evaluation ratings.
Prior to the use of the FFT, teachers in Chicago were evaluated against a rudimentary checklist of classroom practices. This overly generous model led to nearly all CPS teachers (approximately 93 percent) receiving one of the top two ratings in a four-tiered rating system. EITP, on the other hand, utilized the detailed, research-based set of components of the FFT and required teachers to be evaluated multiple times a year. Principals were trained extensively on how to effectively use the framework, and were required to have conferences with teachers before and after observations. Because FFT provided teachers and principals with far more detailed information about instructional performance than the previous system, the framework produced more variation in teacher ratings.
The pilot started with forty-four randomly selected elementary schools in 2008–09; the following year, forty-nine schools were added. CPS worked with the University of Chicago Consortium on School Research to craft an experimental design for implementation, and the University of Chicago randomized schools to take part in the first and second cohorts. Both treatment and control schools were statistically indistinguishable in regards to prior test scores (reading and math) and student composition.
Despite the fact that the experimental design was only maintained for one year, researchers were able to determine how the pilot impacted teacher turnover. While there was no average effect on teacher exits, the researchers did find that teachers who had low prior evaluation ratings were more likely to leave the district due to the evaluation pilot. In fact, by the end of the first year of implementation, 23.4 percent of low-rated teachers in schools using the EITP pilot left the district, compared to 13 percent of low-rated teachers in control schools. The researchers note that although “the leave rate of low-rated treatment school teachers is imprecisely estimated because very few teachers received low ratings, it is remarkably stable and large in magnitude.”
Non-tenured teachers were also “significantly more likely” to leave. Overall, the first year of the pilot saw an 80 percent increase in the exit rate of the lowest performing teachers and a 46 percent increase in the turnover of non-tenured teachers. (In CPS, teachers who are in their first, second, and third year of teaching are non-tenured.) The loss of teachers who were both low-performing and non-tenured suggests that “contract protections enjoyed by tenured teachers provided meaningful job security for those who were low-performing,” as there was no difference in the exit of low-rated tenured teachers. Also worth noting is that teachers who remained in EITP schools were higher performing than those who exited, as were the teachers who replaced exiting educators.
These findings suggest two important conclusions. First, teacher evaluation reforms like the EITP pilot can indeed impact the quality of the teacher workforce by inducing the exit of low performers. In turn, by replacing low-performing teachers with higher-performing ones, achievement should in theory rise, though the researchers did not specifically test this hypothesis. Second, given that low-rated non-tenured teachers were significantly more likely to leave than low-rated tenured teachers, the researchers were able to surmise that “tenure reform may be necessary to induce low-performing tenured teachers to leave the profession.”
SOURCE: Lauren Sartain and Matthew P. Steinberg, “Teachers' Labor Market Responses to Performance Evaluation Reform: Experimental Evidence from Chicago Public Schools,” The Journal of Human Resources, (August 2016).