NAEP 2008 Trends in Academic Progress
Bobby Rampey, Gloria Dion, and Patricia DonahueNational Center for Education Statistics, Institute of Education SciencesApril 2009
Bobby Rampey, Gloria Dion, and Patricia DonahueNational Center for Education Statistics, Institute of Education SciencesApril 2009
Bobby Rampey, Gloria Dion, and Patricia Donahue
National Center for Education Statistics, Institute of Education Sciences
April 2009
As is typical of the Nation's Report Card, the latest results from the long-term trends (LTT) assessment are a mixed bag. Recall that LTT is not the same as the main NAEP assessment. It measures basically the same knowledge and skills as when first administered in the early 1970s, meaning we can observe changes in student performance over time, while the main NAEP assessment responds more readily to curricular fashions. (Read more about the difference here.) This iteration presents data from 2007-08, comparing them, in particular, to the 2004 administration, as well as over the longer term. Key findings include: average reading scores for 9-, 13-, and 17-year-olds are up in reading since 2004, but average math scores are up only for 9- and 13-year-olds. In fact, math scores for 17-year-olds have not budged in 35 years. Still, we need to keep in mind shifting demographics. The country has seen an influx of Hispanic students who generally score well below their white peers. So even though all racial subgroups have made gains over the long term, the national average remains flat. However, achievement gaps appear to be widening in recent years; white 9-year-olds improved their math achievement since 2004, while other groups stagnated. Bottom line, LTT and the main NAEP assessment agree: we continue to do better in reading than math overall. There's been plenty of press as to whether these results should be a feather in NCLB's weathered hat (as former Secretary Spellings argues) or another damning indictment. The real question is whether we can really correlate 2008's LTT with NCLB in the first place. You can find the report here.
Sean F. Reardon, Allison Atteberry, Nicole Arshan, and Michal Kurlaender
Institute for Research on Education Policy and Practice at Stanford University
April 2009
The authors of this study employ longitudinal student data to gauge the effect of California's exit exam (CAHSEE, taken in 10th grade) on student persistence (as measured by the percentage of students remaining in school in their original district at the end of 11th and 12th grade), graduation rates, and academic achievement. The report compares the class of 2005, which was not subject to the exit exam requirement, with the classes of 2006 and 2007, which were subject to it. The authors were able to isolate the effect of the CAHSEE because the class of 2005 actually did take the exam (in the spring of their 10th grade, 2003), thinking it would count as a graduation requirement; the CA State Board of Education changed the policy shortly thereafter. The findings: low-achieving students subject to the exit exam requirement (i.e., the classes of 06 and 07) displayed marginally lower rates of persistence and significantly lower graduation rates than those who were not (i.e., the class of 05). And those from the lowest quartile who had a graduation requirement clocked in a graduation rate 15 percentage points lower than low-achievers free from the requirement. Furthermore, these effects were disproportionately strong among minority students and female students, which the authors blame in part on a stereotype threat: "the phenomenon whereby the fear that if one performs poorly on a high-stakes test it will confirm a negative societal stereotype about one's group leads to increased test anxiety." As for achievement, CAHSEE has no discernible effect; the authors found that students learned no more between the tenth and eleventh grade administrations of the state accountability test. It's not immediately clear why they would, though: the CAHSEE is given in 10th grade (though failing students have numerous opportunities to retake it), so it seems that the year prior to rather than after the CAHSEE would have been more relevant to examine. On top of this, the authors' graduation-rate findings could reasonably be explained; exit exams are supposed not only to standardize but also to raise the standards bar so it would be expected that graduation rates would initially dip. Finally, in regards to the stereotype threat, we're still wondering how the authors figure that the class of 2005 is a good baseline since those students thought the test counted when they took it. If you want to take a look for yourself, you can find the report here.
Some high-school senior pranks leave lasting damage and result in criminal charges for the perpetrators. Other antics live in memoriam after the cow has been coaxed off the school roof. And some pranks are simply impressive, like the one pulled off by two students in Fruita, Colorado. Seniors Alex Almy and Jesse Poe planned a midnight run to weld a po-mo-style spray-paint-decorated Eagle hatchback (that's a car) around the Fruita Monument High School's flagpole. Yes, around. The two were punctilious in more ways than one. They covered the car with a tarp as they made their dead-of-night drive and they had two friends look out for police as they completed the deed--but they also ran the idea past their parents and were thoughtful about how the mischief would be received. "We thought a lot about if people would think we're disrespecting the flag," Poe explains. "I'd feel really bad if a veteran or someone took it that way. That's why I wrote ‘God Bless America' on the side of [the car]." Darling, aren't they? All in all, school administrators thought the prank showed "a lot of Wildcat pride," and as long as they help remove the affixed vehicle from the school's patriotic mast, they won't get in trouble. Gadfly sees a promising future for these boys in modern art.
"Fruita Monument seniors pull off quite the prank," by Richie Ann Ashcraft, Grand Junction Daily Sentinel, May 1, 2009
I'm reminded again and again of America's need for independent education-achievement testing-and-audit bureaus to track and report student performance and school achievement and to sort out the claims and counterclaims regarding when these indicators have risen and when not--and perhaps also to explain why.
The National Assessment Governing Board and National Center for Education Statistics perform some of this function for the country as a whole, though they don't blow whistles when someone makes dubious claims or suggests impossible causal relations--even (perhaps especially) when that someone is the former Education Secretary who appointed all of NAGB's current members.
Writing in the Washington Post on Monday, Margaret Spellings, relentlessly defending the No Child Left Behind Act over the implementation of which she long presided, tried to attribute NAEP gains since 1999 to the impact of NCLB. She neglected to remind readers that NCLB was proposed in January 2001, signed into law in January 2002, and that the first school year on which it could conceivably have had any influence would be 2002-03. The most recent (long-term) NAEP results come from spring 2008, meaning that five years is the longest period for which any student gains could even be associated with NCLB, much less attributed to that statute. (Because this was no random experiment, any gains or declines could equally have been caused by global warming, Taliban infiltration, or whatever.) Unfortunately, the long-term-trend NAEP wasn't administered in 2003 (or 2002, for that matter), so one faces a challenge in deciding what year to use as a baseline. The assessment was given in 1999, then again in 2004 and in 2008. Spellings opted for 1999--because doing so strengthened her claim. However, of the gains recorded between 1999 and 2008 (for 9- and 13-year-olds)--and there were modest gains in both math and reading--the lion's share occurred between 1999 and 2004, not between 2004 and 2008. One could even suggest that NCLB slowed the rate of gain. I wouldn't say that. But Spellings shouldn't suggest that NCLB caused the gains, either, since most of them occurred prior to its enactment.
Regrettably, not a peep was to be heard from NAGB or NCES about her dubious use of NAEP results.
By contrast, there's been a lively and continuing exchange in New York City between Diane Ravitch, functioning as a sort of one-woman truth squad, and Chancellor Joel Klein's throng, about what gains by Gotham's schools and children can legitimately be associated with the changes wrought in that city's education system by Messrs. Klein and Bloomberg (now running for re-election, of course). As Ravitch has repeatedly shown (in this New York Times op-ed, for example, and this response to Jennifer Bell-Ellwanger, who works for Klein), Joel's team, much like Spellings, has chosen a serves-their-own-purposes baseline against which to claim credit for achievement gains even though their reforms hadn't kicked in at the time when the greatest gains were recorded (on New York State tests, in this case).
That exchange has indeed been lively, but it lacks any arbitrator to resolve it. That's because America has a long and sorry tradition, particularly at the state and local level, of entrusting testing and the analysis and reporting of test results (and other performance indicators) to the very system whose performance is, in effect, being appraised. That's an inherent conflict of interest, a commingling of the company treasurer's function with that of an outside auditor.
Advocates must certainly be expected to reach for whichever data they think make the most convincing case for their accomplishments, exertions, and assertions (and, of course, they then suggest causal relationships that no reputable scientist would accept). Critics, similarly, choose the evidence that bolsters their arguments. This will continue. But the advocates usually prevail because they generally look "official" and their critics can be made to look like cranks. The underlying problem, however, is that the advocates, official or not, typically have their own axes to grind, their own records to defend, their own interests to advance.
The Oklahoma legislature tried during its current session to address this problem but Governor Brad Henry vetoed it. Supported by a highly unusual fraternity of business and education groups--even the teacher unions--lawmakers sought to transfer control of testing and academic standards from the state education department to a new and independent/nonpartisan "Education Quality and Accountability Office." The logic was that the Education Department should implement programs and policies but somebody else should measure, report, and judge the outcomes. Makes sense to me. But apparently not to the governor. As a result, Oklahoma's education chickens will continue to be weighed and measured by the foxes.
That's the norm across America but it's one that needs disrupting.
Am I dreaming? After I wrote something similar on Flypaper, a reader said I might as well hunt for unicorns as for objective, independent audits of education performance. But I'm undeterred. The NAGB/NCES model, even if muted at times when it might do well to make noise, is basically sound. States and districts would be better off with their own versions of this than continuing to expect the carnivores to report an honest count of the poultry.
Speaking of Los Angeles, over the past fifteen years, the LA Unified School District boasts a total of 159 review cases for firing tenured teachers--159 in fifteen years. (Apparently, there were a few more, but the records have all been destroyed.) At fault is a combination of over-the-top tenure protections, labyrinthine legal procedures that can clock dismissal costs in the six-digits, and review panels that seem to give just about anyone a free pass. These teachers are mocking student suicide attempts, keeping marijuana in their desks, and sleeping with their coworkers in the metalworking shop--and are still in the classroom! Those who have been removed exist in a limbo similar to New York City's "rubber rooms," all on the taxpayer's dime. You might think that United Teachers Los Angeles, the local teachers union, would be a bit embarrassed by these revelations, disclosed in a series of Los Angeles Times investigative reports this week. Alas no; it's pushing its luck with an (illegal) strike planned for Advanced Placement test day to protest looming budget cuts and teacher layoffs. "We expect parents to understand that the loss of one day to stop the chaos that would occur with larger class sizes and the laying off of teachers is well worth it," explains UTLA president A.J. Duffy. What parents should understand is that if Mr. Duffy allowed all of the bad teachers to be fired, the district wouldn't have to be laying off many good ones right now.
"Firing tenured teachers can be a costly and tortuous task," by Jason Song, Los Angeles Times, May 3, 2009
"School officials call for legislation easing firing of teachers," by Jason Song, Los Angeles Times, May 4, 2009
"L.A. teachers union plans 1-day strike," by Howard Blume, Los Angeles Times, May 2, 2009
"L.A. Unified pays teachers not to teach," by Jason Song, Los Angeles Times, May 6, 2009
"Workday filled by TV, exercise, reading," Los Angeles Times, May 6, 2009
The 1997 release of the Trends in International Mathematics and Science Study (TIMSS) results was a wake-up call for the United States--and for Germany. But what's notable about this particular event was not that both countries were outperformed by some 20 other nations or that the disappointing results spurred prolific and apocalyptic pontification on the dire implications and consequences on both sides of the Atlantic. What's notable is how Berlin and Washington responded in drastically different ways.
Germany serves as a particularly informative example because of the structural similarities it shares with the U.S.; not only do neither country's central governments have constitutional authority over education, but control of the sector is located, for the most part, at the state level. In addition, they boast similar state-level organizational leadership models (Germany has a conference of state education ministers, Ständige Konferenz der Kultusminister der Länder or KMK, which is somewhat akin to America's Council of Chief State School Officers or CCSSO).
But there the differences end. Today, Germany boasts a set of national standards and tests that grew out of the 1997 TIMSS results while the U.S. still struggles with a patchwork of standards and assessments that vary widely across states. The No Child Left Behind Act only served to exacerbate this incongruity by embracing standards-based reform while rejecting national standards; the law, in fact, pushes the system in the opposite direction by requiring states to get virtually all of their students to "proficiency" but explicitly allowing states to define "proficient" as they saw fit. Unfortunately, the latest international test scores remind us that the U.S. still lags behind.
Twelve years later, another opportunity is at hand for Washington to follow in Berlin's footsteps. President Obama and Secretary Duncan find themselves with a rare chance to invest in the development of national standards and tests, thanks to the American Recovery and Reinvestment Act. And the governors and state superintendents are talking seriously about making this happen from the bottom up, with states taking the lead. At this critical moment in time, can we avoid reinventing the national standards wheel as if it had never been designed anywhere before? What can we learn from our international neighbors? Our investigation of ten countries that have addressed this issue (Russia, France, Brazil, Canada, China, India, Germany, South Korea, Singapore, and the Netherlands) reveals six lessons.
1. It's not true that national standards portend loss of local control. The international evidence here is clear: national standards are not--at least, need not be--developed in isolation by a distant central government that runs the education system and quashes local control. In most cases, in fact, the national set of standards is a floor and not a ceiling; in other words, the baseline is set by the central authority, but states and municipal governments retain the flexibility to add to the standards, choose textbooks, and handle the day-to-day operation of schools. We believe an approach that recognizes the authority of the nation, state, and local district is the best path for the U.S. And based on our research, we recommend that national content standards comprise 75 to 90 percent of the total, with states and local districts (and, as appropriate, individual schools) crafting the last 10 to 25 percent.
2. Create an independent, quasi-governmental institution to oversee the development of national standards and assessments and produce reports to the nation. This lesson has two components--the creation of an independent national center and what that center is charged to do. On the first count, we think the need for such a center on these shores is clear, especially because the U.S. suffers from location-based and socio-economically created fragmentation of opportunity and curriculum. Some assert that common standards could develop if states simply shared their standards with one another. But Germany tried this method for 60 years and the country retained its disparities of opportunity. Other countries found the same shortcomings. Thus, we conclude that it's not possible to create focused, coherent, and rigorous standards for all children without a national institution. We'd advocate an independent quasi-governmental type institution--more like the National Assessment Governing Board or the National Academy of Sciences, and not part of the U.S. Department of Education--created by the states. It would include an apolitical board of academics, educators, officials, and representatives of the public; appointments could be made through states. Besides developing standards, it would also update them periodically, as well as set policies for the development and administration of an accompanying national assessment.
3. Position the federal government to encourage and provide resources for the standards-setting process. Germany again sets an apt example. While the initial standards development work was supported financially by both the federal government and the states, the KMK, which led the effort, is not part of the federal government. In fact, the KMK's ability to convince states to play along was rooted in this separation: it could assure states that the standards and the assessment results would not be used to "punish" schools. We recommend that the U.S. government take a similar tack--encourager and resource provider, not itself the setter of standards.
4. Develop coherent, focused, rigorous standards, beginning with English, math, and science. This lesson takes its clues from the 30-plus countries included in TIMSS. While curricular standards in top achieving countries were focused, rigorous, and coherent, U.S. state standards were generally not. The reason is simple: U.S. standards cover too many topics to go into any of them in depth--and then repeat the same topics grade after grade. Furthermore, instead of reflecting the inherent logic of the discipline from which curricular topics are drawn, standards are arbitrarily thrown together in a process governed more by politics than by content. So, for example, while most high-performing countries focus on algebra and geometry during middle school, the U.S. defers those topics to later grades and uses the middle years to repeat arithmetic topics already covered in grades one through five. To remedy these problems, American national standards should include the voices of subject-matter scholars and leaders from business, professional, or vocational fields with similar subject-matter knowledge as well as examine other countries' standards and international benchmarks, insofar as these are available.
5. Administer national assessments (including open-ended questions) at Grades 4, 8, and 12 every two years. Since the National Assessment of Educational Progress (NAEP) already tests at these intervals, we should follow in its footsteps. And taking a lesson from our international brethren, who mostly do not test annually, we suggest that the U.S. administer our national assessment every other year. (The details, of course, are still to be worked out, especially in light of lesson six, below.) Taking another cue from international practice, these tests should include a variety of question formats--open-ended, multiple choice, and, budget permitting, also even a few reliable performance-based items.
6. Hold students, teachers, and schools accountable for performance. Setting standards and administering assessments that go with them amount to little if they do not inform future decision making. When properly aligned with national standards, assessment results in many countries help determine whether students should progress from one level of schooling to the next, how administrators and teachers are rewarded, and how resources are allocated amongst schools. In order to best use test results, accountability should span multiple levels (student, classroom, school, regional, and national) and assessment results should be made public. Twelfth grade assessment results, over time, should also be used as an indicator of college and workplace readiness by using them to make postsecondary decisions.
It's time for the United States to have its epiphany as Germany did 12 years ago. The results of our scattered approach are obvious: low standards by international comparisons, mediocre student performance (especially in eighth and twelfth grades), huge inequalities in curricular opportunities, and the resulting drag on our economy. The good news is that we know what the standards of top-achieving nations look like: focused, coherent, and rigorous. Not only that, but they are part of focused and coherent systems, and informed by scholars who understand the disciplines from which the content is drawn, by educators and teachers who know how children learn and how content is best taught, and by lay people who work in various fields and know how the content is applied in the workplace and society.
The process of establishing national standards will surely require time, patience, and a great deal of compromise. But we postpone the inevitable at our own peril.
By William H. Schmidt, Richard T. Houang, and Sharif M. Shakrani
William H. Schmidt is a University Distinguished Professor of Statistics and Education at Michigan State University and co director of the MSU Education Policy Center. Richard T. Houang is Adjunct Professor of Statistics and Education and Director of Research for the Center of Research on Mathematics and Science Education at MSU. Sharif M. Shakrani is Professor of Statistics and Education and co-director of the Education Policy Center, also at MSU. This editorial is drawn from a policy brief presented at Tuesday's Fordham-sponsored conference, International Lessons about National Standards. Their full study will be available later this summer.
Bobby Rampey, Gloria Dion, and Patricia Donahue
National Center for Education Statistics, Institute of Education Sciences
April 2009
As is typical of the Nation's Report Card, the latest results from the long-term trends (LTT) assessment are a mixed bag. Recall that LTT is not the same as the main NAEP assessment. It measures basically the same knowledge and skills as when first administered in the early 1970s, meaning we can observe changes in student performance over time, while the main NAEP assessment responds more readily to curricular fashions. (Read more about the difference here.) This iteration presents data from 2007-08, comparing them, in particular, to the 2004 administration, as well as over the longer term. Key findings include: average reading scores for 9-, 13-, and 17-year-olds are up in reading since 2004, but average math scores are up only for 9- and 13-year-olds. In fact, math scores for 17-year-olds have not budged in 35 years. Still, we need to keep in mind shifting demographics. The country has seen an influx of Hispanic students who generally score well below their white peers. So even though all racial subgroups have made gains over the long term, the national average remains flat. However, achievement gaps appear to be widening in recent years; white 9-year-olds improved their math achievement since 2004, while other groups stagnated. Bottom line, LTT and the main NAEP assessment agree: we continue to do better in reading than math overall. There's been plenty of press as to whether these results should be a feather in NCLB's weathered hat (as former Secretary Spellings argues) or another damning indictment. The real question is whether we can really correlate 2008's LTT with NCLB in the first place. You can find the report here.
Sean F. Reardon, Allison Atteberry, Nicole Arshan, and Michal Kurlaender
Institute for Research on Education Policy and Practice at Stanford University
April 2009
The authors of this study employ longitudinal student data to gauge the effect of California's exit exam (CAHSEE, taken in 10th grade) on student persistence (as measured by the percentage of students remaining in school in their original district at the end of 11th and 12th grade), graduation rates, and academic achievement. The report compares the class of 2005, which was not subject to the exit exam requirement, with the classes of 2006 and 2007, which were subject to it. The authors were able to isolate the effect of the CAHSEE because the class of 2005 actually did take the exam (in the spring of their 10th grade, 2003), thinking it would count as a graduation requirement; the CA State Board of Education changed the policy shortly thereafter. The findings: low-achieving students subject to the exit exam requirement (i.e., the classes of 06 and 07) displayed marginally lower rates of persistence and significantly lower graduation rates than those who were not (i.e., the class of 05). And those from the lowest quartile who had a graduation requirement clocked in a graduation rate 15 percentage points lower than low-achievers free from the requirement. Furthermore, these effects were disproportionately strong among minority students and female students, which the authors blame in part on a stereotype threat: "the phenomenon whereby the fear that if one performs poorly on a high-stakes test it will confirm a negative societal stereotype about one's group leads to increased test anxiety." As for achievement, CAHSEE has no discernible effect; the authors found that students learned no more between the tenth and eleventh grade administrations of the state accountability test. It's not immediately clear why they would, though: the CAHSEE is given in 10th grade (though failing students have numerous opportunities to retake it), so it seems that the year prior to rather than after the CAHSEE would have been more relevant to examine. On top of this, the authors' graduation-rate findings could reasonably be explained; exit exams are supposed not only to standardize but also to raise the standards bar so it would be expected that graduation rates would initially dip. Finally, in regards to the stereotype threat, we're still wondering how the authors figure that the class of 2005 is a good baseline since those students thought the test counted when they took it. If you want to take a look for yourself, you can find the report here.