Further fiddling with standards and tests

Chester E. Finn, Jr.

7.31.2002

Some weeks back, I used this space to describe ways that a state's academic standards may be lowered, including several that occur out of public view. (See "A field guide to low standards," May 16, 2002.) I explained how a state might simply set low standards, focus its tests on the easier skills covered by the standards, create deliberately easy test questions and generous rubrics, or establish low cut scores for passing the tests. That editorial prompted a number of reader comments. These revealed additional holes in existing state systems of standards-based accountability and further illumined why it is so hard to do right by this education reform strategy, notwithstanding the new oomph supplied by the No Child Left Behind Act.

In some states, there's simply no strong commitment to the idea of clear standards that spell out what children should know or to accurate measures of what they DO know. As one writer put it, "I believe my home state of Vermont uses several of your suggested methods to defeat standards. In English, Vermont's 'portfolio' system makes almost any kind of scoring completely subjective. Even when considering a student's portfolio work in other subjects, the student actually polishes each portfolio submission, running it past his teacher several times before a highly unrepresentative sample of his or her work gets placed into the holy folder.

"My wife and I have attempted to receive documentation of ANY curriculum for our children's schooling at several grade levels beginning with first grade and have only had success in getting any details from the system after our son reached high school. Four times we were given the generalized state standards document that you say can
usually be found on the web. This is so incredibly generalized in Vermont that the
best one can say of it is that it mentions various fields of learning."

To this disgruntled dad, I say that Vermont is famous for not subscribing to the view that education standards should be concrete, assessments should be straightforward, and information about school expectations and student performance should be transparent. Indeed, Governor Howard Dean openly flirted with the possibility of rejecting all federal Title I funding rather than making the Green Mountain state alter its own standards-and-assessment system to comply with NCLB requirements. (He has since backed away from that threat and indicated that Vermont will accept at least the first year of No Child Left Behind moneys, then re-evaluate to see how much difficulty this causes it. He's also now openly running for President!)

Secrets of a test-scorer

In some states, tests are unreliable because those in charge of assessment don't really trust the more straightforward kinds and instead insist on using the formats that are hardest to score consistently and fairly. As one writer, put it:

"Here's some inside information on the scoring of statewide exams. For the past several years, I have worked for one of the firms which has responsibility for writing and scoring statewide exams in such subjects as math, reading, writing, and social studies. The questions are 'essay' type. We don't score multiple-choice questions-presumably because scanners and computers handle them. Sometimes the scoring is done by hand (i.e., you are given the student's exam and you grade it on one of those SAT type fill in sheets); sometimes it's done by computer (you sit at a computer and see an image of the students' answers and grade the answer on the exam). Most of the individuals doing the scoring are college graduates, including people with graduate degrees; a few are college students and, rarely, some have no college education.

"The scorers seem to be dedicated. The scoring can be at times monotonous.
(How would you like to have to grade, say, 17,000 answers to a question such as: describe your favorite character in a novel?) Sometimes, as you might expect, the answers are hilarious (though not intentionally). Sometimes the kids say things like: 'we never studied this' or 'we studied this last year.'

"The main problem I see is that the states set standards which are, at times, not consistent. For instance, on a Social Studies exam: some questions are easy to get a good grade and some questions are very hard. In the latter case, it is often because the state will require a student to use an exact word or phrase but not indicate in the question that some exact word or phrase is required. Sometimes the state will require the student to give, say, three reasons for something in order to get a high score but on the question only indicates that he must give "reasons." This obviously penalizes a student who gives two reasons and assumes that that is sufficient, though he might be completely capable of giving three or more reasons if that were called for. By the way, the questions on which it is hard to get a good score are not necessarily the 'hardest' questions on the exam. That is, they are not necessarily the questions that an outside observer would select as the most difficult to answer.

"Although I understand the argument that multiple choice questions can't probe a student's complete understanding, after grading essay questions I am not certain that a student is fairly judged by using the latter type, either. I have discovered that in some states where the results have been terrible, somebody (either at the state, district or school level) has clamped down and the students have done better in the next year. This doesn't occur all the time or most of the time or even as often as I would like, but I have seen instances of it. Sometimes the essay questions ask facts about very specific events which probably take up no more than half a page in a 500 page reader."

This test-scorer opened a grimy window and gave us a peek inside the process. It isn't pretty. Though he seems to remain bullish (as do I) about the potential gains to result from expecting high scores from students, his examples of the uneven and tricky (or maybe just inept) nature of state test items are troubling. Certainly, his experience gives us further reason to argue with those who insist that multiple-choice items are useless and that only the human-scored "extended response" items-famously difficult to judge reliably-deserve respect.

Vermont was once notorious for the unreliability (in the eyes of such testing experts as Daniel Koretz) of its portfolio and "open response" assessments. More recently, North Carolina has had to junk the results of its fourth-grade writing test because last year's results were so uneven and inexplicable. Maryland was so daunted by the technical and political difficulties of its much-praised MSPAP assessment that it is now jettisoning that format entirely in favor of more "objective" tests.

Now the College Board is plunging into this swamp with its revised SAT, which boasts a universal writing requirement. This will pose a whole new level of scoring challenge, both because of the huge numbers taking the SAT and because of the stakes associated (perhaps especially for middle and upper-middle class families) with those scores. Picture a system that tries to assign reliable numerical scores to millions of hand-written essays, year in and year out. I'm all for making kids show that they can write-and the SAT's proctored test setting will make it harder to submit someone else's composition and call it one's own-so the underlying impulse here is sound. But the test-scoring burden will be truly immense. (Watch for litigation. And watch SAT fees soar.)

Perspective of a high school principal

My previous column focused on the problem of setting up 50 state accountability systems without a common yardstick to measure them. One writer, a high school principal, had an idea for fixing this, an idea that, as it happens, relies on the SAT and its major competitor:

"This account of the potential pitfalls in accountability is the most cogent set of arguments I have read about the ways politicians and educrats manipulate testing to serve their own ends. Probably the best way to establish checks and balances is by comparing a school's or state's own data to SAT or ACT scores, which fairly reliably predict success or failure in further learning. This means testing the entire high school population, rather than just the consciously college-bound, and it would involve some expense, but nobody ever said accountability would come cheap. At least we would know whether our schools are doing as well as they claim if we got feedback from testing organizations that have higher ed, rather than K-12, as their primary constituency."

I agree that it would be interesting to see what we could learn by comparing state (or school-level) test results with those from a universal administration of the SAT or ACT. Congress followed similar reasoning when it opted, in No Child Left Behind, to require state participation in the National Assessment of Educational Progress. The idea is that a state's NAEP result will serve as an external audit of its own standards and test results. I believe that can work so long as NAEP retains its integrity, but state-level NAEP only covers grades 4 and 8 and therefore cannot function as an audit for a state's high school standards or exit tests. Perhaps something should!

Other readers with views on this matter, pertinent experiences, or whistles to blow, should please get in touch. This is a conversation worth continuing.