Editor’s note: This is the second in a series of blog posts that will take a closer look at the findings and implications of Evaluating the Content and Quality of Next Generation Assessments, Fordham’s new first-of-its-kind report. The first post can be read here.
Few policy issues over the past several years have been as contentious as the rollout of new assessments aligned to the Common Core State Standards (CCSS). What began with more than forty states working together to develop the next generation of assessments has devolved into a political mess. Fewer than thirty states remain in one of the two federally funded consortia (PARCC and Smarter Balanced), and that number continues to dwindle. Nevertheless, millions of children have begun taking new tests—either those developed by the consortia, ACT (Aspire), or state-specific assessments constructed to measure student performance against the CCSS, or other college- and career-ready standards.
A key hope for these new tests was that they would overcome the weaknesses of the previous generation of state assessments. Among those weaknesses were poor alignment with the standards they were designed to assess and low overall levels of cognitive demand (i.e., most items required simple recall or procedures, rather than deeper skills such as demonstrating understanding). There was widespread belief that these features of NCLB-era state tests sent teachers conflicting messages about what to teach, undermining the standards and leading to undesirable instructional responses.
While many hoped that the new tests were better than those they replaced, no one had gotten under their hoods to see whether that was true—until now.[1] Over the past year, working for the Thomas B. Fordham Institute with independent assessment expert Nancy Doorey, I have led just such a study. Ours is the first to examine the content of the PARCC, Smarter Balanced, and ACT Aspire tests. We compared these three with the Massachusetts Comprehensive Assessment System (MCAS), which many believe to be among the best of the previous generation of state tests.
To evaluate these assessments, we needed a way to gauge the extent to which each test truly embodied the new college- and career-ready standards (and the CCSS in particular). To that end, we worked with the National Center for the Improvement of Educational Assessment to develop a new methodology that focuses on the content of state tests. That methodology is based, in turn, on the Council of Chief State School Officers’ (CCSSO) Criteria for Procuring and Evaluating High Quality Assessments, a document that laid out criteria defining high-quality assessment of the CCSS and other college and career readiness standards.
With that methodology in hand (you can read its specifics in the full report), we brought together more than thirty experts in K–12 teaching (math and ELA), the content areas, and assessment to evaluate each of the four tests, item by item. The process involved multiple days of training and extensive, detailed rating of every item (using actual test forms) in grades five and eight for math and English language arts (ELA).
Our analysis found some modest differences in mathematics. For example, we found that the consortium tests are better focused on the “major work of the grade” than either MCAS or (especially) ACT. We also found that the cognitive demand of the consortia and ACT assessments generally exceeded that of prior state tests—in fact, reviewers thought the cognitive demand of the ACT items was too high relative to the standards. Finally, reviewers found that item quality was generally excellent, though there were a few items on Smarter Balanced that they thought had more serious mathematical or editorial issues.[2]
The differences in the ELA assessments were much larger. The two consortia tests turned out to be well matched to the CCSSO criteria in the key content of the standards. For example, PARCC and SBAC generally required students to write open-ended responses drawing on an analysis of one or more text passages, whereas the MCAS writing passages did not require any sort of textual analysis (and writing was only assessed in a few grades). The consortium tests also had a superior match to the criteria in their coverage of research and vocabulary/language. Finally, the consortia tests had much more cognitively demanding tasks that met or exceeded the expectations in the standards.
Overall, reviewers were confident that each of these tests was a high-quality assessment that could successfully gauge student mastery of the CCSS or other college- and career-ready standards. However, the consortium tests stood out in some ways, mainly in ELA. I encourage you to read the report for more details on both results and methods.
Going forward, the new tests—and states deploying them—would benefit from additional analyses. For instance, researchers need to carefully investigate the tests’ validity and reliability evidence, and there are plans for someone (other than me, thank goodness) to lead that work. Examples of that kind of work include the Mathematica predictive validity study of PARCC and MCAS, as well as recent work investigating mode (paper versus online) differences in PARCC scores. We need more evidence about the quality of these new tests, whether focused on their content (as in our study) or their technical properties. It is my hope that, over time, the market for state tests will reward the programs that have done the best job of aligning with the new standards. Our study provides one piece of evidence to help states make those important decisions.
[1] One previous study examined PARCC and Smarter Balanced, focusing on depth of knowledge and overall quality judgments; it found that the new consortium tests were an improvement over previous state tests.
[2] Because Smarter Balanced is a computer adaptive assessment, each student might receive a different form. We used two actual test forms at each grade level in each subject—one form for a student at the fortieth percentile of achievement and one at the sixtieth percentile—as recommended in the methodology. For PARCC and ACT, we also used two forms per grade, and for MCAS we used one (because only one was administered).
Morgan Polikoff co-authored Evaluating the Content and Quality of Next Generation Assessments and is an assistant professor of education at the University of Southern California's Rossier School of Education.
Editor’s note: This article originally appeared in a slightly different form on the Brookings Institution’s Brown Center Chalkboard.