With schools shuttered nationwide by the COVID-19 pandemic, states had no choice but to cancel standardized testing for the 2019–20 school year. Although certainly less pressing than many other COVID-related issues, the test stoppage is a long-run concern for states and school districts that monitor student performance using annual tests. It will almost surely “freeze” test-based accountability for the current school year, and looking forward to 2020–21, accountability and other programs that rely on test-score growth will need to be revisited. Our research team at the University of Missouri, in partnership with the Fordham Institute, will be conducting a deeper dive on what that “revisit” should look like. We plan to release guidance for states and school districts based on our findings in early 2021.
In the meantime, we know that some advocates are already discussing the larger question of whether we should reduce testing requirements as a permanent policy. For example, should Congress amend the Every Student Succeeds Act to allow for testing in every other grade, or something similar, rather than annually in grades three through eight plus once in high school, as is now the law of the land?
We answer this question within the context of what we see as the three primary uses of test data in K–12 education: (1) to assess student progress and inequities, (2) to monitor school performance, and (3) to provide clear goals for educators and students.
If the value of testing were only with respect to the first item, assessing student progress and inequities, testing requirements could be reduced significantly and we would lose very little information. For example, suppose we replaced the current testing regime with universal testing in core subjects in grades four, eight, and ten. This would cut the number of universal testing grades in half (from six to three), plus remove high school end-of-course tests in states that use them, while still allowing us to track academic progress and inequities across a large portion of K–12 schooling.
When considering the need to monitor school performance, however, the value of additional testing becomes more apparent. To measure test-score growth, we need testing at regular intervals. Maybe not every year, but regularly. (We focus our discussion on test growth because growth is a more accurate indicator of school performance than the average achievement level.) In order to link achievement growth to schools, we also need the testing intervals to be self-contained within schooling levels. To illustrate with a counterexample, consider a hypothetical regime with testing in grades four, six, eight, and ten. If a school district is structured to have K through five, six through eight, and nine through twelve schools, there is no two-year testing interval that allows for a reliable measure of test-score growth at the K–5 level. The closest we can get is with the four-through-six testing window, but this growth period includes one year at K–5 schools and one year at six through eight schools, making it difficult to determine how much of the overall growth between grades four and six is attributable to K–5 schools.
There are potential solutions to this problem, such as adding second-grade testing to the front end of the regime, or testing in grades three, four, seven, eight, and ten. But even if such solutions allowed schools with most grade configurations to contain at least one testing start- and end-point, there would still be information loss relative to the current testing regime because there would be less growth data available per school. The empirical consequences of the information loss could be substantial or minor (this is a question that merits attention in research), but there would be consequences. To summarize, our sense is that there is the potential for modest reductions in annual testing that would still permit the use of test-score growth for monitoring school performance, but significant reductions in testing would compromise the value of testing data for this purpose.
Finally, we turn to the idea that testing may confer learning benefits by providing clear goals for educators and students. We are not aware of strong research studies on the value of testing per se that would allow us to confirm or refute this possibility with confidence, so we won’t belabor the point. However, it is worth noting that research shows the increase in testing initiated by NCLB has resulted in achievement gains, with the caveat that the increase in testing has been bundled with other policies, most notably (and perhaps obviously) accountability based on the tests.
So where do we land on the policy question of whether testing requirements could be reduced? There seems to be some scope for modest testing reductions without meaningfully harming our ability to use tests in the ways that we would like to use them. However, large-scale reductions in testing could cause substantial information loss. Our biggest concern is that we would lose the ability to track the performance of schools in a credible way. Given the general absence of external performance pressure in public education, reducing our ability to monitor school performance in terms of increasing student achievement is problematic.
COVID-19 may have forced us to freeze testing in the short term, but let’s not use it as an excuse to reduce testing requirements to the point where we undermine our ability to capture data on how well schools are serving students. That’s something we’ll need to know long after the virus is gone.