Using ESSA to fix reading: Implications for state policy
By Lisa Hansel and Robert Pondiscio
By Lisa Hansel and Robert Pondiscio
Last week, we encouraged state policy makers and educators to rethink what it takes to develop strong readers and the signals sent to schools by accountability measures. The bottom line: reading comprehension is a slow-growing plant, and the demand for rapid results on annual tests may be encouraging poor classroom practice—giving kids a sugar rush of test preparation, skills, and strategies when a well-rounded diet of knowledge and vocabulary is what’s really needed to grow good readers. Assessment and evaluation policy must ensure that these long-term investments in the building blocks of language growth are rewarded, not punished. Under the Every Student Succeeds Act (ESSA), states have the opportunity to do exactly that.
States also have the freedom to rethink teacher accountability. Because broad, general knowledge builds broad, general reading comprehension ability, school-wide accountability for reading makes far more sense than individual teacher accountability. Every school subject builds the knowledge base that contributes to a child’s reading comprehension ability (you need to know some science to make sense of a science text; history to make sense of a history text, etc.).
Take the comparatively simple task of teaching students to decode. At a minimum, it requires K–2 teachers. For students who struggle, special education teachers, speech pathologists, and others are often involved. Now consider building knowledge: Individual teacher accountability on a fourth-grade reading comprehension test, for instance, is unfair because children’s comprehension depends on what they’ve learned every year, in school and out (a reading test is a de facto test of background knowledge); it’s also unproductive because it lets the early-grade teachers off the hook if they don’t contribute by teaching the knowledge-building subjects. School-wide accountability for reading fosters teamwork.
Yet some teachers do not pull their weight, even when in a supportive school. Elliot Regenstein of the Ounce of Prevention Fund offers a sensible solution: an external inspectorate of teaching, particularly in the untested early grades. We fully agree with Regenstein that “great teaching in the early years is both rigorous in its content and fun for the kids in its delivery. It requires far more skill than many education leaders understand.” But that lack of understanding makes creating such an effective inspectorate very challenging. Let’s not end up like England, where, according to Daisy Christodoulou, the inspectorate system reinforces ineffective practices. States will have to be vigilant to create and sustain productive inspectorates—but the reward is likely to be well worth the effort.
Strong decoding instruction remains absolutely essential; states should ensure that students are mastering basic reading skills in the early grades. Here are three plans to complement important skills instruction by focusing on patiently investing in building knowledge and vocabulary across the curriculum and grade levels. Our first two suggestions work with existing reading comprehension assessments. The third takes advantage of ESSA’s “innovative assessment” pilot.
Option 1: Incentivize adoption of a knowledge-rich curriculum
ELA standards assume that schools have strong curricula in place across subjects, or encourage the adoption of them. It shouldn’t be left to chance. Every school—particularly those serving disadvantaged learners—should be encouraged to have a knowledge-rich curriculum that results in virtually all students scoring proficient in reading comprehension by the eighth grade. The nature of language growth is such that in earlier grades, scores will likely fluctuate (especially in high-poverty schools) as academic domains that have been taught may or may not appear on any particular reading test. By eighth grade, a well-rounded and well-implemented curriculum should result in all children having the broad knowledge they need to be proficient readers—just like most privileged kids do today.
Schools in which at least 85 percent of students in each subgroup are proficient should continue to do what’s working for their students. Schools that don’t meet that high bar might be required to:
In districts with high student mobility rates, states should strongly encourage the adoption of a district-wide scope and sequence—a list of all the ideas, concepts, and topics taught in each subject and grade—developed with the participation of educators from schools in each district. This document would provide more guidance to teachers than is offered by standards alone, but it would be less fleshed out than a full curriculum—allowing each school to customize its lesson plans, student projects, etc. Students who change schools would not end up with gaps and repetitions in their learning, which function as roadblocks to reading comprehension.
For its lowest-performing schools, states should take even stronger action, such as requiring the curriculum to be submitted to the state for review, sending teams to observe instruction and provide coaching (per Regenstein’s inspectorate), developing a model curriculum (or placing online the curricula of high-performing schools), and/or offering professional development and courses to increase teachers’ knowledge of the domains they should be teaching. States should also examine how these low-performing elementary schools are teaching and assessing decoding. When remedial decoding instruction is needed, states should help schools devise interventions that avoid the common practice of pulling students out of science, social studies, and art classes.
Option 2: Create a state-wide model sequence
States that wish to strongly support building knowledge should convene educators to collaboratively develop a model grade-by-grade sequence of academic domains to teach in each grade. This model might be put online as a scaffold for schools as they develop their curricula, but it should not be mandatory. If policy makers need to be convinced there’s a demand for this, they should check out the number of downloads in their own states of materials developed for EngageNY. It's been utilized as much by teachers outside the state as by the New York instructors it was built to serve.
The sequence should specify academic domains (like ancient Egypt or gravity) for every subject in each grade. Above all it should be coherent and cumulative, ensuring that all children have broad knowledge—including in art and music—by the end of eighth grade. Such a sequence would have two major benefits. First, teacher preparation and professional development could guarantee that all teachers have deep knowledge of the domains they are responsible for teaching. Second, children who change schools would have far less interruption in their education. Moving to a new neighborhood would no longer result in learning about the American Revolution twice while missing out on World War I (at present, there’s no guarantee that kids learn either). Even better, if a consortium of states created a model sequence, publishers could create slimmer, more focused textbooks that covered the domains in the sequence—not eight hundred pages on every topic a teacher might want to cover.
Option 3: Using the ESSA pilot provision, create a state-wide sequence and sequence-based reading tests
States interested in using ESSA to increase reading ability while also creating more coherent educational systems could follow this idea to its logical conclusion: creating state-wide sequences—and sequence-based reading comprehension assessments—for grades 3–8.
ESSA’s innovative assessment pilot encourages up to seven states to completely rethink the role of testing in teaching and learning. Sequence-based reading assessments would make the subject matter of the passages predictable (more like assessments in other subjects), reassuring teachers that if they teach the specified domains, their students will be optimally prepared to comprehend the passages they are to be tested on. There would be nothing to be gained from preparatory drills that don’t contribute to students’ knowledge base (and hence their comprehension ability).
The importance of this to the teaching profession cannot be overstated. With the sole exception of reading/ELA, every teacher—from third-grade math to AP U.S. History—knows the subject matter students must learn in order to prepare for a test. The “black box” nature of reading tests is actively undermining reading achievement, particularly among disadvantaged kids.
Ideally, sequence-based assessments would be cumulative. Instead of tests with reading passages that sample some topics only from the domains for that grade, they would sample from all of the domains in the current and prior grades. This mirrors the cumulative nature of building knowledge and places appropriate responsibility on K–2 teachers. Most importantly, it rewards consistent investment in knowledge and vocabulary—precisely what is missing from current practice (and dis-incentivized in current policy).
Cumulative, sequence-based reading comprehension assessments would incentivize all teachers to teach everything in the sequence. They would lead to broad knowledge, reduce the extent to which scores are a reflection of what has been learned at home, eliminate the temptation to spend time on test-prep drills, and provide a more accurate picture of the schools’ contribution to children’s performance.
***
These are but three ideas among many. But the overarching principal is what wise policy makers must keep in mind: Reading comprehension is not a skill that schools teach, it’s a condition they create. Accountability plans must ensure that every student gets the broad knowledge and vocabulary that remain the unacknowledged drivers of language proficiency. Higher standards simply cannot be met without them.
One of my favorite pieces of writing is four sentences long. It’s the statement General Dwight Eisenhower drafted in the event D-Day ended in defeat:
Our landings in the Cherbourg-Havre area have failed to gain a satisfactory foothold and I have withdrawn the troops. My decision to attack at this time and place was based upon the best information available. The troops, the air, and the Navy did all that bravery and devotion to duty could do. If any blame or fault attaches to the attempt it is mine alone.
This noble declaration came to mind as I studied for the exams to become an elementary teacher in Massachusetts. I wondered how I should explain if I did not pass. And I still do, because I won’t learn until March 18.
You may be wondering how delusional one must be to compare failing a test to failing to liberate Western Europe. I think what you’re really asking, though, is an estimation question, and the answer would be best expressed using scientific notation. Hitting the books has hugely improved my math skills, you see.
Having been credentialed as an elementary teacher already—years ago, in a neighboring state I will identify only by indicating that it has the most loathsome baseball team imaginable—I figured that doing so here would be an annoyance, but not especially challenging. Truth be told, I felt kind of like one of those drivers obliged, through some bureaucratic misfortune, to retake the road test.
My stars, was I wrong. And you know what else I was wrong about? About half the answers on my first practice math test. I was expecting “Pick out the rhombus!” and “Do the times tables up to thirteen!” Nope. See for yourself. Questions 20 and 36, for example, gave me a bit of a workout.
Of course, I shouldn’t presume that my ignorance is also yours; maybe you solved them straightaway. But that was just the math. How about this: Why would some cells have more mitochondria than others? As someone who last studied biology when Michael Dukakis was running for president, I basically had to relearn the subject. And a whole lot of other stuff as well.
With the help of knowledgeable friends, hours of study, and a kind and patient man named Sal Khan, I got better at the sample tests. But it was humbling. If you factor in my two graduate degrees, my fifteen-year teaching career, and the fact that I’ve written disparagingly about teacher preparation standards in these very pages, my humbling was squared (or cubed, or something).
And now follows perhaps the strangest sentence I have ever penned: I am grateful to the Massachusetts Tests for Educator Licensure, for I see the world anew.
Before, I just noticed clouds; now, I can tell a cirrus from a stratonimbus. Last week, I used the term “Brownian motion” to describe my daughter’s itinerant slumber in her toy-strewn crib. I didn't use to talk like that. I mean, until fairly recently I wouldn’t have been able to explain, apart from the spelling, any important distinction between the eukaryote and the ukulele. These days I regard cells with something approaching holy awe.
When I’d mention to people that I was studying for the elementary certification tests, and that the math one in particular was pretty tough, I’d get a certain kind of look. It was invariably sympathetic, but often perceptibly patronizing as well. It was probably like the look my friend Nick would get when he’d say he was studying for his first driver's license. Nick, a fellow teacher, was in his thirties at the time. (He was living in New York City and half-French, so I’ll leave you to estimate how much of the weirdness to discount. Or, who knows, cube.)
Nick and I once traveled to Normandy to see the beaches where bravery and devotion to duty proved triumphant. It’s something I’ll never forget. Nor have I ever forgotten something Nick told me about education in France, where he did his schooling: “In America, you can be considered smart even if you’re bad at math,” he said. “It’s not like that in France.”
I’m hoping that my results permit me to avoid being affirmed stupide by our oldest ally. Either way, studying for these exams has retrieved forgotten knowledge, illuminated horizons of ignorance, and propelled my own education. I know I’m a teacher and all, so I’d probably say something like this about learning anyway, but let me again declare my gratitude.
And if I fail? Then I wish to declare—with my humility cubed, or worse—that any blame or fault attaches to poorly worded questions.
Peter Sipe has spent most of his career teaching middle school English. He’s thinking about returning to elementary school. See his previous Flypaper editorials, "At ed schools, a low degree of difficulty,” "How to challenge voracious young readers," and “Finding life lessons for students on the obituary page.”
In this week’s podcast, Joel Rose of New Classrooms joins Mike Petrilli to discuss how technology makes “differentiation” doable, non-cognitive skills under ESSA, and the future of Success Academies in New York City. In the Research Minute, Amber Northern examines how teacher reforms have affected teacher effectiveness.
Matthew A. Kraft and Allison F. Gilmour, "Revisiting the Widget Effect: Teacher Evaluation Reforms and the Distribution of Teacher Effectiveness," Brown University (February 2016).
A new set of four studies conducted by Pat Wolf and colleagues evaluate various aspects of the Louisiana Scholarship Program. The program, it’s important to note, prohibits participating schools from using their normal selective admissions process for their voucher kids and also mandates that they administer the state test, among other requirements.
The first study examines how the scholarships affect student achievement. It focuses on the 2012–13 applicant cohort, including those who took state tests in grades 3–6 in school year 2011–12. This provides student baseline scores for kids before entering the program. Students who applied to oversubscribed schools were randomly chosen to receive scholarships. The study found that the voucher program had a negative impact on participating students’ achievement in the first two years of operations, most clearly in math. Specifically, a voucher user who was performing at the fiftieth percentile at baseline fell twenty-four percentile points below their control group peers in math after one year. By year two, however, they were thirteen percentile points below, so at least they were on the upswing. (The results for reading impact can’t be presented with confidence.)
The second study measured the impact of the voucher programs on non-cognitive skills like self-control and conscientiousness. It found no differences between kids awarded scholarships and those who were not. Due to several reasons—unreliable measures and low response rates, among them—the scholars say that these results aren’t conclusive.
The third study examines how the program impacts racial isolation and finds that, overall, the program improves the integration of Louisiana schools. That’s because many black voucher-receiving students leave schools where they are overrepresented and enter private schools where they are underrepresented.
The final study attempts to measure the competitive pressures facing public schools as a result of the voucher program. In other words, how does the program affect students who remain in public schools? They admit that this is hard to measure and end up looking for a “consistent story” relative to things like the proximity of private schools in relation to public schools, their density, and how evenly distributed they are in their respective areas. They find neutral-to-positive impacts that are small in magnitude. Effects are largest (but still modest) for students attending those public schools with a private school competitor in close proximity.
To summarize, the program results in negative effects for voucher kids in math; no harm for kids in public schools; no impact on non-cognitive measures; and improvement in racial integration in the schools. That’s a mixed bag if there ever was one.
SOURCE: Jonathan N. Mills, Anna J. Egalite, and Patrick J. Wolf, "How Has the Louisiana Voucher System Affected Students?," Education Research Alliance for New Orleans (February 2016).
America’s schools are staffed disproportionally by white (and mostly female) teachers. Increasing attention has been paid to the underrepresentation of teachers of color in American classrooms, with research examining its impact on expectations for students, referral rates for gifted programs, and even student achievement. This paper by American University’s Stephen Holt and Seth Gershenson adds valuable evidence to the discussion by measuring the impact of “student-teacher demographic mismatch”—being taught by a teacher of a different race—on student absences and suspensions.
The study uses student-level longitudinal data for over one million North Carolina students from kindergarten through fifth grade between the years 2006 and 2010. The researchers simultaneously controlled for student characteristics (e.g., gender, prior achievement) and classroom variables (e.g., teacher’s experience, class size, enrollment, etc.), noting that certain types of regression analysis are “very likely biased by unobserved factors that jointly determine assignment to an other-race teacher.” For example, parental motivation probably influences both student attendance and classroom assignments. The researchers conducted a variety of statistical sorting tests and concluded that there was no evidence of sorting on the variables they could observe, and likely none occurring on unobservable dimensions either. All of which is to say that students’ assignment to teachers was likely truly random, thus enabling researchers to apply a causal interpretation to the findings.
The results for suspensions were stark: Having a teacher of a different race upped the likelihood of suspension by 20 percent, with the results for non-white male students assigned to white teachers accounting for much of the disparity. Having an “other-race” teacher increased the likelihood of chronic absenteeism (eighteen or more days missed) by 3 percent, with non-white males assigned to white teachers most likely to be chronically absent. (The study did not break out teacher data by gender. Nor did it control for student income levels.) Interestingly, there was no relationship between student-teacher racial mismatch and excused absences, but a small and statistically significant impact on unexcused absences, lending evidence to researchers’ hypothesis that absenteeism is caused (at least in part) by “parental and student discomfort with other-race teachers through symbolic effects of demographic representation.”
Suspensions, which reflect the quality of student-teacher relationships as well as the discretion exercised by teachers when determining consequences, might understandably be influenced by subconscious teacher bias. However, given that teachers generally don’t set school discipline policies, it seems worth examining the impact of racial mismatch between students and school leaders (or the staff most closely involved with discipline matters). Finally, the fact that being taught by an other-race teacher also impacts absenteeism is somewhat surprising, lending credibility to the paper’s assertion that “absences reflect parental assessments of their child’s school, classroom, and teacher.”
The study reminds us that teacher race matters, especially for male, non-white students who are disproportionately suspended (and, to a lesser extent, chronically absent). While it’s clear from this study and others that students might be inadvertently harmed by lack of diversity in the teacher workforce, it’s less clear what schools should do about it—especially our poorest and most low-achieving schools that may struggle to recruit and retain high-quality teachers of any race. Even so, for anyone concerned about the achievement and wellbeing of minority students, it’s a topic that demands our attention—and our action.
SOURCE: Stephen Holt and Seth Gershenson, “The Impact of Teacher Demographic Representation on Student Attendance and Suspensions,” Institute for the Study of Labor (Germany) (December 2015).
A new joint study from the University of Kentucky and Indiana University examines the potential effects of exclusionary school discipline polices—particularly school suspensions—on racial differences in reading and math achievement.
They use a hierarchical and longitudinal dataset drawn from the Kentucky School Discipline Study (KSDS) to create a final sample of 16,248 students drawn from grades 6–10 across seventeen different public schools with a demographic roughly representative of the southeastern United States. Data was collected over a three-year period between August 2008 and June 2011. Authors controlled for school-level variables (within-school variation on percent minority, percent free and reduced-price lunch, percent in special education, expenditures per student, school size, and total number of disciplinary offenses committed in a school in a given year), as well as student and non-school factors (race, socioeconomic status, the neighborhood in which the school is located, and a student’s likelihood of suspension).
They found that black students were nearly six times more likely to be suspended than whites, while other ethnic and racial minorities were over two times more likely. Schools with larger concentrations of black students had significantly higher rates of out-of-school suspension, while students who have been suspended in a given year scored 15 percent lower in literacy and 16 percent lower in math. Additionally, students who have a higher risk for low performance are at even greater risk of academic decline after suspension.
The author’s final analysis estimated the effect of group differences in exclusionary discipline policies, such as detention, suspension, or expulsion, on the racial disparity in achievement. Their findings suggest that 20 percent of the black disadvantage on reading achievement and 17 percent on math achievement could potentially be explained through students’ disproportionate exposure to these kinds of policies in school.
One thing that’s missing from this analysis, however (and the authors admit this), is the effect that removing a student from class might have on the academic achievement of other non-disciplined students over time. For example, unmeasured factors such as poor discipline in the classroom could contribute to lower achievement across the board.
In the end, the researchers found no direct causal link between suspension and lower academic achievement (there could be many factors influencing suspensions and student achievement) but did find an association with lower growth over time. They call for more research into whether it is punishment in itself, or the loss of classroom instruction time, that underlies this relationship.
SOURCE: Edward W. Morris and Brea L. Perry, “The Punishment Gap: School Suspension Achievement,” Social Problems (January 2016).