Frederick M. Hess sat down recently with EdNews.org to discuss the state of education reform efforts.
Frederick M. Hess
Q: Rick, you recently published an article in Educational Leadership arguing that the ways in which we rely on data to drive decisions in schools has changed over time. Yet, you note that we have unfortunately only succeeded in moving from the "old stupid" to the "new stupid." What do you do you mean by this?
A: A decade ago, it was only too easy to find education leaders who dismissed student achievement data and systematic research as having only limited utility when it came to improving schools. Today, we've come full circle. You can't spend a day at an education gathering without hearing excited claims about "data-based decision making" and "research-based practice." Yet these phrases can too readily serve as convenient buzzwords that obscure more than they clarify and that stand in for careful thought. There is too often an unfortunate tendency to simply embrace glib solutions if they're packaged as "data-driven." Today's enthusiastic embrace of data has waltzed us directly from a petulant resistance to performance measures to a reflexive reliance on a few simple metrics--namely, graduation rates, expenditures, and grade three through eight reading and math scores. The result has been a race from one troubling mindset to another--from the "old stupid" to the "new stupid."
Q: Can you give us an example of the "new stupid"?
A: Sure, here's one. I was giving a presentation to a group of aspiring superintendents. They were eager to make data-driven decisions and employ research to serve kids. There wasn't a shred of the old stupid in sight. I started to grow concerned, however, when our conversation turned to value-added assessment and teacher assignments. The group had recently read a research brief highlighting the effect of teachers on achievement and the inequitable distribution of teachers within districts. They were fired up and ready to put this knowledge to use. One declared to me, to widespread agreement, "Day one, we're going to start identifying those high value-added teachers and moving them to the schools that aren't making AYP."
Now, I sympathize with the premise, but the certainty worried me. I started to ask questions: Can we be confident that teachers who are effective in their current classrooms would be equally effective elsewhere? What effect would shifting teachers to different schools have on the likelihood that teachers would remain in the district? Are the measures in question good proxies for teacher quality? My concern was not that they lacked firm answers to these questions--that's natural enough even for veteran superintendents--it was that they seemingly regarded such questions as distractions.
Q: What's a concrete example of where educators and advocates overenthusiastically used data to tout a policy, but where the results didn't pan out? What went wrong?
A: Take the case of class-size reduction. For two decades, advocates of smaller classes have referenced the findings from the Student Teacher Achievement Ratio (STAR) project, a class-size experiment conducted in Tennessee in the late 1980s. Researchers found significant achievement gains for students in small kindergarten classes and additional gains in first grade. The results were famously embraced in California, which in 1996 adopted a program to reduce class sizes that cost nearly $800 million in its first year. But the dollars ultimately yielded disappointing results, with the only major evaluation--by AIR and RAND--finding no effect on achievement.
What happened? Policymakers ignored nuance and context. California encouraged districts to place students in classes of no more than 20--but that class size was substantially larger than those for which STAR found benefits. Moreover, STAR was a pilot program serving a limited population, which minimized the need for new teachers. California's statewide effort created a voracious appetite for new educators, diluting teacher quality and encouraging well-off districts to strip-mine teachers from less affluent communities. The moral is that even policies or practices informed by rigorous research can prove ineffective if the translation is clumsy or ill considered.
Q: You and I both know that experimental studies are based on very rigid standardized approaches where there is experimental control. What is wrong with using these studies and trying to apply them to the "real world"?
A: The class size example cited above points to one enormous challenge--generalizing findings across place and time. Details about policies and the contexts in which they are implemented vary across locales. Just as an employee pay system that has been shown to work for Google will not necessarily work for Citigroup, so too a study that shows a pay system that works in one school or district shouldn't be casually presumed to translate everywhere.
Why is this? Employees in two different organizations may have been attracted by different incentives, have varying levels of trust in management, be organized differently, and so forth. Meanwhile, policies may take time to mature, or early success may be due to the skill of early adopters, enthusiasm associated with hot ideas, and foundation support. None of this will necessarily translate. The lesson is that research findings must be interpreted thoughtfully and with an eye toward what may change from the experimental site to other locales.
Q: In medical research, there are specific parameters when it comes to devising and conducting experiments. Do these same parameters apply to educational research?
A: Efforts to adopt the medical model in schooling have been plagued by a flawed understanding of just how the model works in medicine and how it translates to education. The randomized field trial, in which drugs or therapies are administered to individual patients under explicit protocols, is enormously helpful when recommending interventions for particular medical conditions. But, for example, it is far less useful when determining how much to pay nurses or how to hold hospitals accountable.
In education, curricular and pedagogical interventions can certainly be investigated through randomized field trials. Even here, however, there is a tendency for educators to be cavalier about research-based practice. When medical research finds a certain drug regimen to be effective, doctors do not casually tinker with the formula. Yet, in areas like reading instruction, schools routinely alter the sequencing and elements of a curriculum, while still touting their practices as research based. When it comes to policy, officials must make tough decisions about questions like management and compensation that cannot be examined under controlled conditions. Although research can provide valuable insights, studies of particular school choice plans or accountability systems (for the reasons I discussed a moment ago) are unlikely to answer whether such policies "work".
Q: In your mind, what are some of the main limitations of research as they apply to schooling?
A: First, let me be clear: Good research has an enormous contribution to make--but, when it comes to policy, this contribution is more tentative than we might prefer. Scholarship's greatest value is not the ability to end policy disputes, but to encourage more thoughtful and disciplined debate.
In particular, rigorous research can establish parameters as to how big an effect a policy or program might have, even if it fails to conclusively answer whether it "works." For instance, quality research has quieted assertions that national-board-certified teachers are likely to have heroic impacts on student achievement or that Teach For America recruits might adversely affect their students.
Especially when crafting policy, we should not expect research to dictate outcomes but should instead ensure that decisions are informed by the facts and insights that science can provide. Education leaders should not expect research to ultimately resolve thorny policy disputes over school choice or teacher pay any more than medical research has ended contentious debates over health insurance or tort reform.
Q: Let me take a kind of liberal stance for a minute. You have two schools--one has good test scores, but a lot of dropouts and juvenile delinquents and teenage pregnancy. The second school has some academic problems, but the kids are involved in sports, extra-curricular activities, and those kids are good citizens, dare I say "church goers" and drug/alcohol abuse is minimal. What does the data say to us in regard to these two schools?
A: Ultimately, it depends on whether we are collecting the right data and how we want to read those data. When judging schools, do we think tests gauging student achievement should hold pride of place or do we regard those results as one part of a broader body of information. Personally, I think that determination depends on how much confidence we have in those achievement tests and on the nature of the school in question. Many of our state assessments are so lacking that I would be skeptical of results that ran counter to a number of other data points. At the same time, our inability to reliably measure other kinds of outcomes forces us to rely more heavily on simple achievement metrics than we might like.
In your example, if the low-achieving school is catering to low-performing students or an at-risk population, for instance, we should be careful to weigh positive trends and other student outcomes when considering the level of achievement.
Similarly, if the seemingly high-achieving school is attracting advantaged students, then the good test scores should provide little comfort in the face of the other evidence. The real question for both schools would be how much confidence we have that students are learning and growing in the course of their studies--and that requires finding ways to gauge the "value-added" of these classrooms (whether in terms of achievement, cognition, behavior, etc.).
Q: Why do we seem to be giving "short shrift" to management data and what does that imply about how school leaders make decisions on a daily basis?
A: While embracing student achievement data, policymakers and practitioners have paid scant attention to collecting or using data that are more relevant to improving schooling. State tests provide results that are too coarse to offer more than a snapshot of student and school performance, and few district data systems link student achievement metrics to teachers, practices, or programs in a way that helps determine what is working. Ultimately, student achievement measures are largely irrelevant to judging the performance of many district employees. It simply does not make sense to evaluate the performance of a payroll processor or human resources recruiter--or a foreign language instructor--primarily on the basis of reading and math test scores for grades 3 through 8.
Student achievement data alone really only tell us what comes out of the "black box" of schooling--they don't tell us what is happening inside that box. They illustrate how students are faring but do not enable an organization to diagnose problems or manage improvement. It is as if a CEO's management dashboard consisted of only one item--the company stock's price. Helping schools and school systems improve operations and teaching and learning requires tracking an array of indicators, such as how long it takes books and materials to be shipped to classrooms, whether schools provide students with accurate and appropriate schedules in a timely fashion, how quickly assessment data are returned to schools, and how often the data are used. A system in which leaders possess that kind of data is far better equipped to boost school performance than one in which leaders merely have a palette of achievement data.
Q: How should we "steer clear," as you put it, of the "new stupid"?
A: It requires at least three key things. First, educators should be wary of allowing data or research to substitute for good judgment. When presented with persuasive findings or promising new programs, they must ask the simple questions: What are the benefits of adopting this program or reform? What are the costs? How confident are we that the promised results are replicable? What might complicate projections? Data-driven decision making does not simply require good data; it also requires good decisions.
Second, schools must seek out the kind of data they need as well as the achievement data external stakeholders need. Despite leaps in state assessment systems and continuing investment in longitudinal data systems, school and district leaders are a long way from having the management data they require to support high-performing schools and systems. In practice, there is a rarely acknowledged tension between collecting data with an eye toward external accountability (measurement of performance) and doing so for internal management (measurement for performance).
Third, school systems should reward education leaders and administrators for pursuing more efficient ways to deliver services. Indeed, superintendents who use data to eliminate personnel or programs--even if these superintendents are successful and vindicated by the results--are often more likely to ignite political conflict than to reap professional rewards. So long as leaders are revered only for their success at consensus building and gathering stakeholder input, moving from the rhetorical embrace of data to truly data-driven decision making will remain an elusive goal in many communities.
Q: What do you see as the main motivation behind the "new stupid"? Is it simply an example of good intentions gone awry?
A: In a word: yes. It's a strategy pursued with the best of intentions. But the problem is threefold. First, as we've discussed, too many times those of us in K-12 are unsophisticated about what a particular study or a particular data set can tell us. Second, the very passion that infuses the K-12 sector creates a sense of urgency. People want to fix problems now, using whatever tools are at hand--and don't always stop to realize when they're trying to fix a Swiss watch with a sledgehammer. Third, the reality is that we still don't have the kinds of data and research that we need. So, too often, the choice is to misapply extant data or simply go data-free. Everyone involved means well; the trick is provide the right training, the right data, and for practitioners, policymakers, and reformers to ensure that compassion doesn't swamp common sense.
Frederick M. Hess is a resident scholar and the director of education policy studies at AEI.