The Education Gadfly Weekly, Volume 16, Number 37: To improve education, focus on excellence

The Education Gadfly Weekly, Volume 16, Number 37: To improve education, focus on excellence—not failure

Volume 16, Number 37

9.14.2016

The Education Gadfly Weekly, Volume 16, Number 37: To improve education, focus on excellence—not failure

Volume 16, Number 37

view

To improve education, focus on excellence—not failure

By Michael J. Petrilli

Michael J. Petrilli 9.14.2016

NationalFlypaper

To improve education, focus on excellence—not failure

Michael J. Petrilli

9.14.2016

Flypaper

What charter school authorizers can teach states on ESSA: It's hard to find indicators of school quality that aren’t test scores

Kathryn Mullen Upton

9.8.2016

Flypaper

Finding the right tool for the job: Improving reading and writing in the classroom

Victoria McDougald

9.14.2016

Blog

Three years of paying teachers for performance

Amber M. Northern, Ph.D.

9.14.2016

Flypaper

How educational software affected student achievement in Washington, D.C.

Andrew Scanlan

9.14.2016

Flypaper

Effectively designing and scoring standardized tests

Kirsten Hinck

9.14.2016

Flypaper

view

What charter school authorizers can teach states on ESSA: It's hard to find indicators of school quality that aren’t test scores

Kathryn Mullen Upton 9.8.2016

Flypaper

view

Finding the right tool for the job: Improving reading and writing in the classroom

Victoria McDougald 9.14.2016

Blog

view

Trump, Choice, and Discipline

Amber M. Northern, Ph.D., Robert Pondiscio, Brandon L. Wright, David Griffith, Audrey Kim 9.14.2016

Resource

view

Three years of paying teachers for performance

Amber M. Northern, Ph.D. 9.14.2016

Flypaper

view

How educational software affected student achievement in Washington, D.C.

Andrew Scanlan 9.14.2016

Flypaper

view

Effectively designing and scoring standardized tests

Kirsten Hinck 9.14.2016

Flypaper

view

To improve education, focus on excellence—not failure

Michael J. Petrilli

9.14.2016

Flypaper

School failure is no longer the United States’ most pressing educational problem—mediocrity is. Both Trump and Clinton could do a lot of good by changing the tone of the education reform debate—and backing it up with a few discrete changes in policy. Specifically, they could shift the conversation from “failure” and focus it instead on “excellence.”

This is particularly the case for Trump, who found himself in hot water recently for saying to African Americans, “You live in your poverty, your schools are no good, you have no jobs, 58 percent of your youth is unemployed.” Understandably, much of the black community took offense to his inaccurate assertions on poverty and employment. But his claim about schools is problematic too.

For sure, we’re used to hearing that, and some of us are used to saying it. Indeed, many schools serving African Americans (and Latinos and low-income students) haven’t been very good. Some are still failing. But the truth is that they have gotten better over the past two decades—a lot better. The typical African American fourth grader is reading and doing math two grade levels ahead of where the previous generation was back in the 1990s. That’s enormous progress.

That’s the good news. The bad news is that “better” still isn’t enough. On average, African Americans, Latinos, and low-income students are still years behind white, Asian, and affluent peers. They are graduating high school in higher numbers than before, but they aren’t making much progress in college completion, mostly because too many aren’t ready for college in the first place. They need excellent schools, not just “not bad” ones.

So what might the next president do to promote excellence? First, as states develop their new accountability systems under the Every Student Succeeds Act, he or she could encourage them to focus as much on recognition of their high-performing schools as they do on punishments of their worst. Ohio’s Momentum Awards (for schools showing super-fast student-level growth) and All-A Awards (for schools earning straight As on their report cards) are great examples. Note that Ohio gives those awards to any public school that qualifies—whether it is a traditional district school or a charter school. That’s the right approach to emulate.

The next president could also encourage states to pay more attention to students who are doing work not just at grade level, but above it. A new analysis from Fordham found that most state accountability systems maintain one of No Child Left Behind’s fatal flaws—a primary focus on getting students to the “proficient” level of achievement. This encourages schools to ignore their high flyers. This is particularly problematic for low-income, high-achieving students, who tend to lack access to “gifted and talented” programs and similar initiatives. The next president should make it clear that our advanced students deserve our attention too, and states should send clear signals that they matter by holding schools accountable for their progress.

Whether it’s Trump or Clinton, the next occupant of the Oval Office is unlikely to do much on elementary and secondary education in his or her first term. That’s mostly because Congress just finished its work on K–12 education nine months ago in the form of the Every Student Succeeds Act.

Still, a presidential focus on excellence would enliven the education discussion and serve as a sign of true leadership. Candidates: How ’bout it?

Editor’s note: A slightly different version of this essay originally appeared in the Washington Post.

view

What charter school authorizers can teach states on ESSA: It's hard to find indicators of school quality that aren’t test scores

Kathryn Mullen Upton

9.8.2016

Flypaper

The Every Student Succeeds Act requires states to use “another indicator of student success or school quality,” in addition to test scores and graduation rates, when determining school grades. This is in line with the commonsensical notion that achievement in reading, writing, and math, while an important measure, surely doesn’t encapsulate the whole of what we want schools to accomplish for our young people. Reformers and traditional education groups alike have enthusiastically sought to encourage schools to focus more on “non-cognitive” attributes like grit or perseverance, or social and emotional learning, or long-term outcomes like college completion.

We at Fordham wondered whether charter schools might have something to teach the states about finding well-rounded indicators of school quality. After all, when charter schools first entered the scene in the pre-No Child Left Behind era, the notion was that their “charters” would identify student outcomes to be achieved that would match the mission and character of each individual school. Test scores might play a role, but they surely wouldn’t be the only measure.

As the head of Fordham’s authorizing shop in Dayton, I set out to determine which indicators the best charter school authorizers in the nation were using—measures that transcended test scores. Surely, I reasoned, a quarter-century of chartering must have turned up promising approaches.

Well, there’s good news and bad news.

The good news is that it’s common for authorizers to use parent or student satisfaction survey data as one of many pieces of information in school accountability plans. (In fact, we include family survey results in the accountability plans with our schools as well.) Some authorizers also look at student retention from year to year as a proxy for family satisfaction.

The bad news is that I couldn’t find a single authorizer using measures of non-cognitive skills or social and emotional learning for its entire portfolio of schools. And that should be instructive.

The reason is that authorizers use accountability plans to make high-stakes decisions—such as school corrective action, non-renewal, revocation, and closure—that directly impact the hundreds or thousands of families whose children are enrolled in their charter schools. Consequently, it is imperative that those decisions be defensible and grounded in the most objective outcomes possible. And to date, measures of grit et al. aren’t ready for prime time. There’s simply not enough evidence that they are valid and reliable.

I did find a few authorizers that allow schools to develop school-specific accountability criteria. One authorizer’s schools developed metrics regarding student connectivity, character self-assessments, and the degree to which students made positive contributions to their schools and communities. Student surveys are administered to gather this data.

Another authorizer has some of its schools develop program-specific indicators related to environmental education. These include awareness, knowledge, attitudes, skills, and actions. Tools used to evaluate these indicators vary by school and may include student written work, hands-on experiences with natural systems and processes, completion of student questionnaires, and scores achieved during a Socratic seminar.

At present, non-cognitive measures are certainly helpful and informative—a piece of the overall picture. However, they simply are not far enough along to be major factors in accountability decisions. Perhaps ESSA will help drive efforts to more fully develop these types of indicators. For now, though, we should acknowledge that there’s a reason we use test scores and graduation rates as the primary measures of school quality: They are the best we’ve got.

view

Finding the right tool for the job: Improving reading and writing in the classroom

Victoria McDougald

9.14.2016

Blog

Editor’s note: This article is part of the series The Right Tool for the Job: Improving Reading and Writing in the Classroom that provides in-depth reviews of several promising digital tools for English language arts classrooms.

Many years after the adoption of new academic standards in most states, frustrated teachers and administrators across the country still decry the dearth of Common Core-aligned curricular materials. One survey conducted by the Center on Education Policy (CEP) in 2014 found that 90 percent of surveyed districts reported having major or minor problems finding such resources. More recent studies conducted by Morgan Polikoff and Bill Schmidt also conclude that the majority of textbooks marketed as being aligned with Common Core actually have “substantial alignment problems.”

In response to this persistent lack of high-quality, standards-aligned materials, organizations such as EdReports and agencies like the Louisiana Department of Education have begun providing educators with free, independent reviews of curricular resources. Other groups have developed rubrics and evaluation tools intended to help state, district, and school leaders vet the quality and alignment of textbooks, units, and lesson plans (including EQuIP, IMET, and Student Achievement Partners’ “Publishers’ Criteria”). Even Amazon has entered the curricular stage, recently announcing the launch of a new platform for educators that will feature free curricular resources and teacher ratings and reviews.

To date, however, very little information exists on the quality and content of digital learning tools intended to supplement a full curriculum. What does exist isn’t as user-friendly as it could be. One site, EdSurge, aims to provide educators with information on digital curricula and tools for teaching and learning. The site includes hundreds of overviews of various resources (including basic pricing and usage information), which are accompanied by individual educator ratings and feedback on each tool. While the voluminous site includes resources for a wide range of subjects—from language arts to social studies and even engineering—the program and product overviews themselves are fairly brief, and it’s difficult to sort through the (sometimes hundreds!) of educator reflections on each tool. Learning List is a similar site that provides brief reviews of publishers’ instructional materials (both comprehensive and supplemental), including standards alignment.

We thought we might be able to do a little better, at least in terms of providing in-depth reviews of several promising digital tools. To this end, we recruited a team of all-star teachers to evaluate the alignment, quality, and usefulness of nine K–12 English language arts (ELA)/literacy instructional tools: Curriculet, ICivics Drafting Board, Lexia Reading Core5, Newsela, Quill, ReadWorks, Student Achievement Partners’ (SAP) “Text Sets,” ThinkCERCA, and WriteLike. (We focused on ELA resources specifically, as educators stressed to us that those are especially difficult to come by; online math resources are often easier to identify because teachers can search for tools or lessons around a specific math standard.) We also intentionally evaluated a range of reading, writing, and content-building tools, as well as those recommended to us by practitioners and other curriculum experts.

Collectively, these digital tools focus on reading and/or writing instruction across all grades. Most are free or low-cost, and while some resources (such as Newsela) are already in demand, we also aimed to highlight newer and lesser-known resources for the field. We also intentionally included several interactive, student-facing tools; these remain relatively rare in the ELA curricular landscape, which tends to include tools designed mostly for the use of teachers rather than students.

For each resource, we asked reviewers to assess:

What is the tool or product designed to do?
Is it intended to be aligned to college- and career-ready standards (including Common Core)?
Does it include student assessments or data reporting for teachers?
How is it intended to be used, and how might the resource be better used by educators?
Is it organized logically and easy for teachers (and/or students) to use?
Does the tool include beneficial suggestions for how it might be integrated into a larger curriculum?
What are the tool’s overall strengths and weaknesses?

For reading, we also evaluated whether the tools included high-quality texts that are grade-appropriate and sequenced to build content knowledge and vocabulary, and whether they included a balance of text types and text-dependent questions and tasks (as called for by Common Core).

For writing, we assessed whether the tools included instruction on specific writing skills and a balance of writing text types, as called for by Common Core.

The reviews were conducted by four top-notch, experienced educators:

Melody Arabo is a third-grade teacher in Michigan, a National Education Association (NEA) Master Teacher, a Michigan Educator Voice Fellow, and the 2015 Michigan Teacher of the Year.
Jonathan Budd is a K–12 director of curriculum, instruction, and assessments in Connecticut with nineteen years of prior teaching experience. His particular expertise is literacy, with a focus on text complexity.
Shannon Garrison is a fourth- and fifth-grade teacher in California with two decades of teaching experience. She holds a National Board Certification and serves on the National Assessment Governing Board, which sets policy for the National Assessment of Educational Progress (NAEP).
Tabitha Pacheco is a ten-year teaching veteran who holds a National Board Certification. She is a 2015 National Teaching Fellow for the Hope Street Group and serves on the Practitioners Advisory Group for the Center on Great Teachers and Leaders.

Our plan is to release the reviews on a rolling basis over the next 6–8 weeks.

***

Six years into Common Core implementation, procuring CCSS-aligned instructional materials continues to be a time-consuming challenge for many educators. We hope that this series provides ELA teachers with information on nine particularly promising low- or no-cost reading and writing tools that are accessible online.

Stay tuned for the release of our first review next week!

view

Trump, Choice, and Discipline

9.14.2016

Resource

On this week’s podcast, Robert Pondiscio, Brandon Wright, and David Griffith discuss Donald Trump’s school choice proposal and the national debate over school discipline. During the research minute, Amber Northern examines the effect of the charter school authorization process on school quality.

Amber's Research Minute

Whitney Bross and Douglas N. Harris, "The Ultimate Choice: How Charter Authorizers Approve and Renew Schools in Post-Katrina New Orleans," Education Research Alliance (September 2016).

view

Three years of paying teachers for performance

Amber M. Northern, Ph.D.

9.14.2016

Flypaper

A new Mathematica study revisits the effects of pay-for-performance on educators. It evaluates the Teacher Incentive Fund (TIF), which was established by Congress in 2006 and provides grants to support performance-based compensation for teachers and principals in high-need schools.

The TIF program has four components: measuring teacher and principal effectiveness using both student growth and classroom observations; offering bonuses based on effectiveness; enhancing pay for taking on additional roles or responsibilities; and providing professional development to help educators understand the pay-for-performance system.

From 2006 to 2012, the United States Department of Education awarded $1.8 billion to support 131 TIF grants. Mathematica’s study examines implementation of all sixty-two 2010 TIF grantees during the 2013–14 school year (for most of the grantees, this was three years into implementation).

It also separately reports impacts for a ten-district subset of 2010 grantees that participated in a random assignment study. Treatment schools were meant to all four TIF program components; they also received guidance on how to structure the bonuses, including admonitions that the bonuses should be substantial, differentiated, and challenging to earn. Control schools didn’t receive this guidance and were instead meant to implement every component except for the performance bonuses (they did receive an automatic bonus of approximately 1 percent of their annual salary as a benefit of participating).

There were five key findings:

Eighty-eight percent of districts reported implementing at least three of the four required components. They were least likely to report offering professional development to teachers, and least likely to use both growth and observations of school practices to measure principal effectiveness.
Communication was lacking. About 40 percent of treatment teachers were unaware that they could earn a performance bonus, and many also thought the maximum bonus they could earn was no more than two-fifths the size of the actual maximum bonus that districts awarded.
Most teachers (70 percent) received a bonus, suggesting that the pay bumps did not meet the Department of Education requirement of being “challenging to earn.” And it appears that many got the bonus via the observation component of the growth measure, not for assessment results. More than half of teachers received a higher year-three rating on classroom observations than on student achievement growth. The average bonus was about $1,850, which amounts to about 4 percent of the average teacher salary but is less than the 5 percent guidance for substantial bonuses specified in TIF. At least three-quarters of principals also received bonuses—an average of about $4,000—but they weren’t differentiated or “challenging to earn” either.
Despite initially posting better school and/or classroom achievement growth in year one, educators in both treatment and control schools earned similar growth ratings by year three.
Pay-for-performance had small positive impacts on students’ math and reading achievement. After three years, the average student in a control school earned a math score at approximately the thirty-fourth percentile of student achievement statewide, whereas the average student in a treatment school earned a math score at approximately the thirty-sixth percentile. The intervention also lifted reading achievement for the average student from the thirty-sixth to the thirty-seventh percentile. The differences in both math and reading amount to a gain of about four weeks of additional learning in a typical school year and were similar in size to the effects achieved after two years of implementation. In other words, no more growth occurred as a result of an additional year of implementation.

There’s been much discussion of how pay-for-performance should be structured, but if it is watered down and treated like a small bump for everyone instead of a big jump for the most effective teachers, it is likely to end up being an ineffective intervention. A more effective strategy would be to differentiate base pay, which so few districts are willing to do.

SOURCE: Alison Wellington, Hanley Chiang, Kristin Heallgren, Cecilia Speroni, Mariesa Herrmann, and Paul Burkander, "Evaluation of the Teacher Incentive Fund: Implementation and Impacts of Pay-for-Performance After Three Years," Mathematica (August 2016).

view

How educational software affected student achievement in Washington, D.C.

Andrew Scanlan

9.14.2016

Flypaper

This new study, the product of a partnership between District of Columbia Public Schools (DCPS) and researchers at New York University and the University of Maryland (including Dr. June Ahn, author of our recent report Enrollment and Achievement in Ohio's Virtual Charter Schools), examines how students’ use of educational software affects their achievement.

In 2012, DCPS began to implement a web-based mathematics program called “First in Math” (FIM) for students in grades K–8. The initiative consisted of games centered on basic computational skills and concepts like fractions or decimals. The authors examine student-level usage data, including how much time students spent on the FIM system, which modules they completed, and what achievements (like points, collecting “badges,” or unlocking bonus games) they earned at various points in the school year. That information was combined with student-level data, such as gender, English language learner status, special education status, race, grade level, and achievement on the mathematics component of the DC-Comprehensive Assessment System (DC-CAS). The final sample included approximately 9,200 students in Grades 4–8 during the 2012–13 school year.

The analysis reveals some intriguing findings. Time spent using FIM had a small but significant positive relationship with performance on standardized mathematics assessments, even controlling for factors such as prior academic achievement, English language learner status, and special education designation.

Students in lower-achieving schools used FIM approximately twenty-five minutes more per year than students in high-achieving schools, suggesting that they were more likely to benefit from the program. Yet the lower-achieving the student was, the less time they spent using FIM.

Usage was related to other student characteristics, too. Students of color spent more time using FIM than white students—approximately three hours more per year for black and Hispanic students, and four hours more for Asian students. And compared to their peers, female and special education students used the program about thirty-five minutes less, and English language learners about thirty-five minutes more.

The authors do, however, admit limitations to both their methodology and the program itself. FIM consists of largely rote practice tasks but does not teach deeper conceptual understanding. This may mask the effects of teachers or peers on achievement, or the impacts of additional programs that students may be participating in alongside FIM (authors were limited in what they could control at the school level). Authors concede that there is merely correlation between program usage and student achievement—not causation. While they show a linear relationship between time-on-task and the amount of stickers/badges earned (indicating progress), detailed information about how students spend their time-on-task is not available.

Nevertheless, the results suggest that even a modest use of 10–20 hours over the course of the school year (15–30 minutes per week for forty weeks) may have a modest but significant impact on math achievement. The study is also a testament to the benefits of researcher-practitioner partnerships to effectively share data and evaluate programs. Education software developers and teachers must continue to think of innovative ways to make their offerings appealing to reluctant students who could most benefit from the supplemental tools.

SOURCE: June Ahn, Austin Beck, John Rice, and Michelle Foster, “Exploring Issues of Implementation, Equity, and Student Achievement With Educational Software in the DC Public Schools,” AERA Open (October – December 2016).

view

Effectively designing and scoring standardized tests

Kirsten Hinck

9.14.2016

Flypaper

A report recently released by the Economic Studies program at the Brookings Institution delves into the complex process behind designing and scoring cognitive assessments. Author Brian Jacobs illuminates the difficult choices developers face when creating tests—and how those choices impact test results.

Understanding exam scores should be a simple enough task. A student is given a test, he answers a percentage of questions correctly, and he receives a score based on that percentage. Yet for modern cognitive assessments (think SAT, SBAC, and PARCC), the design and scoring processes are much more complicated.

Instead of simple fractions, these tests use complex statistical models to measure and score student achievement. These models—and other elements, such as test length—alter the distribution (or the spread) of reported test scores. Therefore, when creating a test, designers are responsible for making decisions regarding test length and scoring models that impact exam results and consequently affect future education policy.

Test designers can choose from a variety of statistical models to create a scoring system for a cognitive assessment. Each model distributes test scores in a different way, but the purpose behind each is the same: reduce the margin of error and provide a more accurate representation of student scores that can be analyzed on individual and aggregate levels.

With certain models, one element that designers must consider is how many parameters (or conditions) to include when measuring student performance. Parameters account for the varying difficulty of specific test items, how well test items differentiate between the abilities of students, and the possibility that a response was answered correctly by guessing. A test that accounts for only one parameter will score student performance differently than a test that accounts for all three.

Test length, one of the simpler elements a test designer considers when creating an assessment, also significantly affects score distributions. Teachers and students tend to prefer shorter tests, but longer tests better measure student ability. Like statistical models, longer tests decrease the margin of error by shifting the scores of the lowest- and highest-performing students closer to the mean. This, in turn, produces more accurate test results.

Once designers have determined a test’s length and how to measure student performance, they decide how to report the test’s results. Modern cognitive tests assign scores using numerical scales that provide an ordinal ranking of student performance within a range of possible scores.

The report provides a clear and focused explanation of how modern assessments are designed and scored. Its summary of this complex process is an excellent resource for those who make decisions based on test results. Schools, administrators, and policy makers should all make it a priority to educate themselves on the intricacies of cognitive testing—and this report provides an easy-to-navigate introduction to that process. Better understanding will lead to better decisions that will benefit schools, teachers, and students.

SOURCE: Brian A. Jacob, “Student test scores: How the sausage is made and why you should care,” Brookings Institution (August 2016).

The Education Gadfly Weekly

Sign Up to Receive Fordham Updates

The Education Gadfly Weekly, Volume 16, Number 37: To improve education, focus on excellence—not failure

The Education Gadfly Weekly, Volume 16, Number 37: To improve education, focus on excellence—not failure

Amber's Research Minute