School policies have gotten smarter in the decade after No Child Left Behind
By Michael J. Petrilli and Chester E. Finn, Jr.
By Michael J. Petrilli and Chester E. Finn, Jr.
A decade ago, U.S. education policies were a mess. It was the classic problem of good intentions gone awry.
At the core of the good idea was the commonsense insight that if we want better and more equitable results from our education system, we should set clear expectations for student learning, measure whether our kids are meeting those expectations, and hold schools accountable for their outcomes (mainly gauged in terms of academic achievement).
And sure enough, under the No Child Left Behind law, every state in the land mustered academic standards in (at least) reading and math, annual tests in grades 3–8, and some sort of accountability system for their public schools.
Unfortunately, those standards were mostly vague, shoddy, or misguided; the tests were simplistic and their “proficiency” bar set too low. The accountability systems encouraged all manner of dubious practices, such as focusing teacher effort on a small subset of students at risk of failing the exams rather than advancing every child’s learning.
What a difference a decade makes. To be sure, some rooms in the education policy edifice remain in disarray. But thanks to the hard work and political courage of the states, finally abetted by some implacable leaders in Washington, the core elements of standards-based reform have seen a reasonably thorough cleansing and dramatic upgrade.
Take the academic standards themselves. We and our colleagues at the Thomas B. Fordham Institute have been fans of the Common Core standards. By our lights, they’re dramatically clearer and stronger than most of the state standards they replaced and on par with the rest. They do a good job of incorporating the evidence on what it takes for students to be “college- and career-ready,” and they get most of the big issues right. What’s more, despite all of the political sturm und drang around the Common Core, these ambitious standards are still in place (sometimes with different labels) in more than forty states.
But that’s not all. Part of the promise of the Common Core initiative was that the new standards would be joined by “next-generation” assessments—tests that match the intellectual demands of the Common Core, are harder to game, and actually deserve to guide classroom instruction rather than encourage mindless test preparation. Now we know that this promise has also been kept. A new Fordham study, “Evaluating the Content and Quality of Next Generation Assessments,” found that the two most widely used new tests (PARCC and Smarter Balanced) are well matched to the Common Core and plenty challenging. (Two other assessments that we examined are strong too, though not quite as good a fit for the standards.)
It would be better if half the states hadn’t decided to go their own way on testing, dropping out of the PARCC or Smarter Balanced consortia (or never joining in the first place). It may turn out that their tests—most of them new—are also sound, but we won’t know until somebody gets under their hoods to see.
What we do know is that even these go-it-alone states have made it more challenging to pass their tests, by setting their “cut scores” at dramatically higher levels than before. This provides a more honest report to parents, teachers, and principals about whether their kids are on track for success. As Harvard University’s Paul Peterson recently wrote in Education Next, “the Common Core consortium has achieved one of its key policy objectives: the raising of state proficiency standards throughout much of the United States.”
Stronger standards, better tests, higher cut scores—so far, so good. But that leaves one last element: the accountability systems themselves (a.k.a. the calculations and labels that states use to grade schools and decide which are doing well and which are candidates for intervention). Here states still have some distance to travel. But thanks to the Every Student Succeeds Act (the replacement for No Child Left Behind that President Obama signed late last year), they have more latitude to design systems that accurately distinguish between strong and weak schools.
States can now focus most of their analysis on individual student progress over time—the fairest way to assess the value that schools add to student learning and the best way to disentangle school grades from demographics over which they have scant control. The new law encourages them also to look beyond test scores at “other indicators of student success or school quality”—a smart idea if done right. And they can focus on all their students, not just those on the edge of proficiency, thus correcting our education system’s longstanding neglect of those who have already cleared the bar.
Importantly, the new law also removes the federal mandate—pushed by former Education Secretary Arne Duncan—that states deploy test-based teacher evaluations. That move proved politically poisonous, putting too much weight on the bad old tests while sapping teacher support for the new ones. (Student results will remain available, however, for states and districts that want to incorporate them in teacher evaluations.)
We’ve been known for ages as education gadflies, and we still find plenty to fault when it comes to policy and practice in the United States. But let us be clear: Despite what you might hear from opt-outers and other critics, U.S. standards, tests, and accountability systems are all dramatically stronger, fairer, and more honest than they were a decade ago. You might even call it progress.
Editor’s note: This article originally appeared in a slightly different form in the Washington Post.
Editor’s note: This is the second in a series of blog posts that will take a closer look at the findings and implications of Evaluating the Content and Quality of Next Generation Assessments, Fordham’s new first-of-its-kind report. The first post can be read here.
Few policy issues over the past several years have been as contentious as the rollout of new assessments aligned to the Common Core State Standards (CCSS). What began with more than forty states working together to develop the next generation of assessments has devolved into a political mess. Fewer than thirty states remain in one of the two federally funded consortia (PARCC and Smarter Balanced), and that number continues to dwindle. Nevertheless, millions of children have begun taking new tests—either those developed by the consortia, ACT (Aspire), or state-specific assessments constructed to measure student performance against the CCSS, or other college- and career-ready standards.
A key hope for these new tests was that they would overcome the weaknesses of the previous generation of state assessments. Among those weaknesses were poor alignment with the standards they were designed to assess and low overall levels of cognitive demand (i.e., most items required simple recall or procedures, rather than deeper skills such as demonstrating understanding). There was widespread belief that these features of NCLB-era state tests sent teachers conflicting messages about what to teach, undermining the standards and leading to undesirable instructional responses.
While many hoped that the new tests were better than those they replaced, no one had gotten under their hoods to see whether that was true—until now.[1] Over the past year, working for the Thomas B. Fordham Institute with independent assessment expert Nancy Doorey, I have led just such a study. Ours is the first to examine the content of the PARCC, Smarter Balanced, and ACT Aspire tests. We compared these three with the Massachusetts Comprehensive Assessment System (MCAS), which many believe to be among the best of the previous generation of state tests.
To evaluate these assessments, we needed a way to gauge the extent to which each test truly embodied the new college- and career-ready standards (and the CCSS in particular). To that end, we worked with the National Center for the Improvement of Educational Assessment to develop a new methodology that focuses on the content of state tests. That methodology is based, in turn, on the Council of Chief State School Officers’ (CCSSO) Criteria for Procuring and Evaluating High Quality Assessments, a document that laid out criteria defining high-quality assessment of the CCSS and other college and career readiness standards.
With that methodology in hand (you can read its specifics in the full report), we brought together more than thirty experts in K–12 teaching (math and ELA), the content areas, and assessment to evaluate each of the four tests, item by item. The process involved multiple days of training and extensive, detailed rating of every item (using actual test forms) in grades five and eight for math and English language arts (ELA).
Our analysis found some modest differences in mathematics. For example, we found that the consortium tests are better focused on the “major work of the grade” than either MCAS or (especially) ACT. We also found that the cognitive demand of the consortia and ACT assessments generally exceeded that of prior state tests—in fact, reviewers thought the cognitive demand of the ACT items was too high relative to the standards. Finally, reviewers found that item quality was generally excellent, though there were a few items on Smarter Balanced that they thought had more serious mathematical or editorial issues.[2]
The differences in the ELA assessments were much larger. The two consortia tests turned out to be well matched to the CCSSO criteria in the key content of the standards. For example, PARCC and SBAC generally required students to write open-ended responses drawing on an analysis of one or more text passages, whereas the MCAS writing passages did not require any sort of textual analysis (and writing was only assessed in a few grades). The consortium tests also had a superior match to the criteria in their coverage of research and vocabulary/language. Finally, the consortia tests had much more cognitively demanding tasks that met or exceeded the expectations in the standards.
Overall, reviewers were confident that each of these tests was a high-quality assessment that could successfully gauge student mastery of the CCSS or other college- and career-ready standards. However, the consortium tests stood out in some ways, mainly in ELA. I encourage you to read the report for more details on both results and methods.
Going forward, the new tests—and states deploying them—would benefit from additional analyses. For instance, researchers need to carefully investigate the tests’ validity and reliability evidence, and there are plans for someone (other than me, thank goodness) to lead that work. Examples of that kind of work include the Mathematica predictive validity study of PARCC and MCAS, as well as recent work investigating mode (paper versus online) differences in PARCC scores. We need more evidence about the quality of these new tests, whether focused on their content (as in our study) or their technical properties. It is my hope that, over time, the market for state tests will reward the programs that have done the best job of aligning with the new standards. Our study provides one piece of evidence to help states make those important decisions.
[1] One previous study examined PARCC and Smarter Balanced, focusing on depth of knowledge and overall quality judgments; it found that the new consortium tests were an improvement over previous state tests.
[2] Because Smarter Balanced is a computer adaptive assessment, each student might receive a different form. We used two actual test forms at each grade level in each subject—one form for a student at the fortieth percentile of achievement and one at the sixtieth percentile—as recommended in the methodology. For PARCC and ACT, we also used two forms per grade, and for MCAS we used one (because only one was administered).
Morgan Polikoff co-authored Evaluating the Content and Quality of Next Generation Assessments and is an assistant professor of education at the University of Southern California's Rossier School of Education.
Editor’s note: This article originally appeared in a slightly different form on the Brookings Institution’s Brown Center Chalkboard.
Over the years, students have resorted to all kinds of chicanery as a means of concealing bad grades from their parents. Intercepting report cards in the mail has long been a reliable standby, along with the artful application of X-Acto knives, whiteout, and copy machines. But major publishers are soon going to have to unearth some new methods to screen their own poor performance from concerned eyes: EdReports, which tests the putative alignment of instructional materials to the Common Core standards, released a new round of textbook assessments last week, and the results are too putrid to hide. The organization found that four textbook series released by McGraw-Hill, the Center for Mathematics and Teaching, and the College Board only intermittently met its expectations for alignment with the standards. It’s hardly a surprising revelation, given the abysmal record of industry leaders when it comes to producing materials of rigor and coherence. The only question now is how soon presidential candidates will start blaming Common Core itself for the mess.
As the Republican field has narrowed, we bade a fond “Don’t let the door hit ya where the good Lord split ya” to former Louisiana Governor Bobby Jindal. You may remember him from such films as The State of the Union Response is a Thankless Nightmare and Don’t Run for President When Your Approval Rating is Below Your Mortgage APR. But reformers will always be grateful to Governor Kenneth for the Marie Antoinette-level frivolity of his anti-Common Core lawsuit. As of this month, however, the Pelican State is officially down one unwinnable court case, as new Governor John Bel Edwards and Attorney General Jeff Landry raced one another to drop Jindal’s folly. (But take comfort, trial lawyers: The executive counsel working on the lawsuit still pocketed nearly half a million dollars in taxpayer funds for his trouble.) Just one more reason to be grateful that elections have consequences.
The American people were reminded of that very same axiom this week by the death of Justice Antonin Scalia, the conservative court’s master visionary. His untimely departure from the national scene raises any number of political questions that will have to be resolved over the coming months by President Obama and his tormentors in the Senate’s Republican majority—or perhaps by his successor (and theirs). In the meantime, the 4-4 liberal/conservative split occasioned by Scalia’s demise will produce a raft of tie votes on issues ranging from the environment to abortion to congressional representation. The most noteworthy effect for education observers will likely be on Friedrichs v. California Teachers’ Association, a case that had the potential to prohibit “agency fees” and effectively cripple public employee unions. In the absence of a ruling from the Roberts court (which had looked ready to decide in favor of the plaintiffs during oral arguments), an earlier pro-union ruling from the Ninth Circuit Court of Appeals will stand. Barring an extremely surprising development, that means we should probably get comfortable with the status quo, at least for a while.
In case you’re curious about what America would look like if a certain real estate mogul qua jester tyrant were to win the 2016 presidential election, give some attention to Maine Governor Paul LePage, who has drawn nationwide jeers as a sort of mini-Trump (Trumplet?). The race-baitin’, refugee-hatin’, impeachment-facin’ LePage would make for great television if his bumbling weren’t impacting the lives of actual people (two-thirds of whom never even voted for him). Now the governor is applying his great talents to education, proposing to name himself Maine’s next education commissioner rather than allow Democrats in the state legislature to torpedo the candidate he’d already nominated for the role. This isn’t the first time LePage has meddled in Vacationland schools; in 2015, he threatened to cut state funding from a charter school if it didn’t rescind an employment offer to one of his political opponents. If he makes good on his latest threat, the nation’s worst governor will also become its worst chief state school officer overnight.
In this week's podcast, Robert Pondiscio and Brandon Wright laud the progress of education policies since NCLB, weigh gentrification’s role in D.C.’s achievement gains, and discuss the controversy surrounding a Success Academies video. In the Research Minute, Amber Northern examines educators’ perspectives on Common Core implementation.
SOURCE: Thomas J. Kane et al., "Teaching Higher: Educators’ Perspectives on Common Core Implementation," Center for Education Policy Research, Harvard University (February 2016).
Way back in the days of NCLB, testing often existed in a vacuum. Lengthy administration windows created long delays between taking the test and receiving results from it; many assessments were poorly aligned with state standards and local curricula; communication with parents and teachers was insufficient; and too much test preparation heightened the anxiety level for teachers and students alike. These issues largely prevented assessments from being used to support and drive effective teaching and learning. That doesn’t mean just state tests, either, but rather the full range of assessments given during the year and across curricula.
But the new federal education law creates a chance for a fresh start. While ESSA retains yearly assessment in grades 3–8 and once in high school, the role of testing has changed. States are now empowered to use additional factors besides test scores in their school accountability systems, states may cap the amount of instructional time devoted to testing, funding exists to streamline testing, and teacher evaluations need no longer be linked to student scores. These changes may mean less anxiety, but that won’t equate to better outcomes unless significant reforms occur when states design their new assessment systems.
A new report from the Center for American Progress (CAP) focuses on how to implement such systems. Its authors utilize parent and teacher focus groups, online parent surveys, and interviews with assessment experts and others to pinpoint current problems with testing. Their findings show, for example, that parents recognize the value of testing but want it to be better at providing individualized feedback to their child; teachers crave more time and support (think sample tests, high-quality instruction materials, and opportunities to observe excellent teachers); and stakeholders need better communication. The lack of alignment among standards, curriculum, and tests is an exceptionally significant problem.
To address these issues, the CAP team envisions an ambitious, multifaceted system that routinely evaluates students’ knowledge and skills. It would do so using formative and interim assessments to provide timely and actionable feedback to teachers and parents, culminating in a summative test determining whether students have met grade-level expectations and made satisfactory progress. They supply recommendations to guide federal, state, and local leaders as they implement ESSA. For states, these recommendations include conducting alignment studies to ensure that students are tested on what they learn (and that what they learn matches state standards); developing better communication tools such as clear score reports; and demanding that test results be delivered in a timely way. For districts, CAP recommends that leaders eliminate redundant tests, support teachers’ understanding of assessment design and administration, communicate more effectively with parents about the purpose and use of tests, and streamline assessment logistics. Schools, meanwhile, can improve assessment systems by working with teachers to communicate with parents, stopping unnecessary test preparation, and making test taking less stressful for kids.
All in all, CAP’s recommendations are solid and insightful, and they deserve attention from state and district leaders as they begin to implement the new federal law.
SOURCE: Catherine Brown, Ulrich Boser, Scott Sargrad, and Max Marchitello, “Implementing the Every Student Succeeds Act Toward a Coherent, Aligned Assessment System,” Center for American Progress (January 2016).
A new study from the University of Arkansas examines the relationship between Milwaukee’s citywide school voucher program and students’ criminal behavior.
Controlling for factors such as family income, parental education, and the presence of two parents in the home, the authors used data from Wisconsin court records to compare the criminal behavior of voucher students with non-voucher students. The groups, comprising some two thousand students, were enrolled in eighth or ninth grade in 2006 as part of Milwaukee’s Parental Choice Program (MPCP) and the Milwaukee Public School (MPS) system.
The study first analyzed only pupils who were enrolled in MPCP and MPS in 2006, regardless of how long they stayed in the program, and found no statistically significant results. Next, the researchers measured the effects of a “full dose” of voucher program treatment (i.e., students who were enrolled in 2006 and stayed through the twelfth grade). These students were found to be 5–7 percent less likely to commit a misdemeanor, 2–3 percent less likely to commit a felony, and 5–12 percent less likely to be accused of any crime as young adults. (Participants were between twenty-two and twenty-five years old at the time the data were analyzed.) In other words, the longer a student stayed in a voucher program, the less likely he or she was to participate in criminal activity, suggesting that “sustained exposure” to a voucher program helps decrease criminality.
The authors do offer a few caveats: First, assignment of vouchers was not random; most of the grades in voucher schools were not oversubscribed and thus did not require lotteries to admit students. Second, they believe that it would be helpful to understand more about non-cognitive factors like grit and conscientiousness, which might affect a student’s propensity to break the law. Finally, these results are specific to Milwaukee’s voucher program and are therefore difficult to apply nationally.
Though there is extant literature on the correlation between higher levels of education and a person’s likelihood to commit crime, further research into the potential link between school choice and criminal activity (and what characteristics or factors may be driving those interactions) would be helpful.
SOURCE: Corey DeAngelis and Patrick J. Wolf, “The School Choice Voucher: A “Get Out of Jail” Card?,” Department of Education Reform, University of Arkansas (January 2016).
It’s well known that students of color are underrepresented in gifted programs compared to white and Asian students. Attempting to understand why, a new study from Vanderbilt University investigates how student, teacher, and school characteristics affect pupil assignment to gifted programs in reading and math.
Researchers derived a data sample of approximately 10,640 pupils from the NCES Early Childhood Longitudinal Study, Kindergarten Class of 1998–1999 (ECLS-K). The ECLS-K data tracks pupils from kindergarten through the eighth grade nationwide, collecting descriptive information on student, family, school, and community factors with questionnaires administered to parents, teachers, and school administrators. The authors used this study to extract information on student demographics and achievement, as well as school environment, classroom environment, and teacher qualifications and demographics during the first, third, and fifth grades—times when most gifted students are identified in elementary school. Finally, researchers measured the probability of gifted assignment based on each characteristic.
Overall, the odds of black and Hispanic children being referred to gifted programs are 66 percent and 47 percent lower than white students, respectively. Moreover, when student, teacher, and school characteristics were averaged, white students had a predicted probability of 6.2 percent of gifted—whereas black students had only a 2.8 percent probability. Although student socioeconomic status, test scores, teachers, and school setting are associated with gifted assignment in reading and math, none of these characteristics could fully account for racial disparities. Tellingly, teacher race was the factor most correlated with gifted assignment for black students only—so much so that black students were three times more likely to be assigned to gifted reading programs when previously taught by black teachers than when taught by non-black teachers. (There was no evidence of significant correlation between teacher/pupil race congruence and gifted reading assignment for white, Hispanic, or Asian students, or gifted math assignment for all subgroups.)
Yet again we are reminded of the deepening disparities pervading public education. But these findings offer a new perspective that warrants further investigation. Perhaps a lack of black teachers causes a greater underrepresentation in gifted programs than previously thought. The authors suggest that their findings allude to the “bureaucratic representation theory”: Citizens are more likely to receive public services and positive treatment when their government officials share the same racial or ethic background.
My own experience as a former teacher and student at predominately black and Hispanic schools supports this explanation. As an educator, my advocacy for high-performing minority students was often met with reluctance—and sometimes flat-out disapproval—from my white and Asian colleagues. When I was a first grader, I vividly remember how students of color were denied acceptance into our school’s gifted program, while a select group of classmates—mostly white and Asian pupils—were pulled out of class for enrichment. We wondered what they did during their time away and questioned our own competence. Perhaps policies that better identified talented students of color might have spared us these thoughts.
A century ago, W.E.B. DuBois birthed the “Talented Tenth” concept and expressed the need to invest in the best and brightest from the black community to better promote upward mobility. This is just as necessary today. As much as we tend to focus on pushing our “bubble kids” and those at the bottom toward proficiency, we must also challenge and develop our highest-achieving students, especially those of color. More and better gifted programs for minority students can help narrow the excellence gap, so we should heed the study’s recommendations and look further into how screening processes may be racially exclusive and unfair to those kids who possess diverse forms of intelligence. As DuBois argued, our talented tenth ought not be forgotten.
SOURCE: Jason A. Grissom and Christopher Redding, “Discretion and Disproportionality: Explaining the Underrepresentation of High-Achieving Students of Color in Gifted Programs,” AERA (January 2016).