Global grades for U.S. states

12.22.2011

How much more "international benchmarking" does American education actually need? Gary W. Phillips's inspired new study of how U.S. states and (some) districts are doing vis-à-vis the rest of the world suggests that we already have a heckuva lot of performance information available right under our noses. We just needed Phillips, a veteran top official at the National Center for Education Statistics, now at American Institutes of Research (AIR), to show us how to analyze it.

What he's produced (under AIR's aegis) is a set of metrics that enables readers to see how the math performance of students in countries that participated in TIMSS in 2007 compared to the performance of U.S. students on NAEP (in grades 4 and 8) that same year. What makes this possible is that the underlying TIMSS and NAEP "frameworks," assessments, scoring schemes, and sampling arrangements are sufficiently similar in this subject; math is also a subject that every state must test via NAEP and that a handful of big-city districts test via NAEP's "trial urban district assessment" (TUDA).

Phillips superimposed an American-style grading system on TIMSS countries' academic performance--and then used statistical manipulations to devise an equivalent grading system for U.S. states and cities.

In 4th grade math, for example, country grades ranged from B+ (Hong Kong, Singapore) down to D (Iran) and "below D" (Columbia, Kuwait, etc.). The international average was a C; the OECD (wealthy, mostly-western countries) average was C+; the U.S. grade was also C+.

Eighth grade results were similar but a bit lower: fewer B's and C's, many more D's, and averages at C rather than C+.

Then Phillips devised similar marks for U.S. states based on their NAEP results. These range (in 4th grade) from B (MA, MN, NJ, NH, KS) through lots of C's to D+ for the District of Columbia. We can see that every state (except D.C.) surpassed the international average in 4th grade math but only half of them did better than the OECD average.

The eighth grade results are roughly similar, except that there Massachusetts was the only state with students scoring in the "B" range (B-, actually).

Suppose you are most interested in Ohio. Its estimated TIMSS score (for 8th grade) was 516 (C+). You can "map" that onto the actual TIMSS results and see that Ohio lands between England and Hungary. It does better than a bunch of countries but worse than the Asian "tigers" with grades in the B range. (The whole study would be more striking if there weren't so much grade compression in B-C territory.)

Now suppose you are most interested in California. Its estimated TIMMS score (also for 8th grade math) was 485--a flat C. That places its students between those of Italy and Serbia--again better than a lot of countries but worse than a dozen, including Russia, Australia, Sweden, and Scotland (as well as the Asian tigers).

Phillips went on to apply this analysis to the eleven big-city districts that took part in the TUDA-NAEP math assessment in 2007. Their best grades are C+ (in 4th grade) and C (in 8th) but they also have a lot of D+ marks, particularly in 8th grade. Los Angeles, for example, came in at 457 in 8th grade (versus the California score of 485 noted above), which places it around the international average (adjacent to Bosnia-Herzegovina), but far below the OECD average.

There are obviously limits to such analyses. Math is the only subject in which every U.S. state participates in NAEP and that is tested by TIMSS. (TIMSS covers science, too, but states aren't obligated to participate in NAEP science assessments and those don't occur as frequently.) I don't know whether the "PIRLS" international literacy assessment is similar enough to NAEP's reading assessments to yield such comparisons. Maybe Phillips can figure that out.

Demographics are limiting factors, too, and can't be sorted out with these samples at the macro level. We might want to know, for example, how low-income American students compared with low-income Australians or Scots, and it could be more illuminating to compare Los Angeles with, say, Sydney or Birmingham, than with entire states and nations.

The biggest limitation, of course, is that, while this sort of analysis is intensely revealing at the state-federal policy level, it tells us nothing about our own child or his/her school. Except for TUDA districts, it doesn't even tell us anything about our own school district. For that kind of information--actionable information at the community and family level--we must currently depend on state testing regimens, and we have a huge pile of sad evidence that those are discrepant, uneven, and even misleading.

That's why I remain a fan of national testing. To get there, however, we must first develop some sort of national standards. The folks busily doing that today--the NGA-CCSSO "common standards" project--haven't even embarked on the lower grades as yet, much less on the assessment part. In the end, though, assessments are needed if the standards are to have traction in the real world. (Now that Secretary Duncan has pledged hundreds of federal millions to underwrite test development, though, one assumes it will eventually happen.) If we want those new assessments to lend themselves to international comparisons, à la Phillips--not needed at the "policy level," we can now see, but important if we seek to compare individual districts, schools or kids to the world beyond our borders--then the underlying frameworks, ie., the standards themselves, the work being done today in sealed rooms somewhere in Washington, must be similar to TIMSS and/or other international tests. (We don't have space here to examine the pros and cons of TIMSS vs. PISA, etc.; suffice to say, TIMSS makes more sense for a bunch of reasons.) This also means that, if the standards-developers go off on tangents that cannot be measured--the biggest risk here is an overdose of "21st Century skills"--we will not, in the end, be able to determine how our students and schools compare with anybody at all.

Because students rarely acquire skills and knowledge that aren't actually taught, however, a properly aligned education system must also address curriculum and instruction. We can see that Massachusetts and other relatively high-performing states have focused on such alignment. So must the users of whatever emerges from national standard-setting. Lessons from abroad may prove helpful here, too.

But what Gary Phillips has shown us we don't need is for U.S. states and cities to sign up for TIMSS (or PISA) themselves. Those sorts of comparisons can already be done--and Phillips deserves kudos for showing the way.