Evaluating the Evaluations

| March 10, 2015

March 10, 2015

Who likes the current teacher evaluation system in New York?

Not the governor, who called it “baloney” in his State of the State speech.

Not Tim Kremer of the New York State School Boards Association, who called it “overly complex, bureaucratic and too easily manipulated.”

Not Derrell Bradford of the pro-charter group NYCAN, who wrote, “The current framework is being deliberately broken at the local level.”

Not lawmakers, many of whom lauded the evaluations just a few years ago when they were implemented.

While almost everyone agrees the system is broken, fixing it is muddied by different views of what ails it.

One of Cuomo’s answers is to double down on testing.

“Thirty-eight percent of high schools students are college ready—38 percent,” Cuomo emphasized during his State of the State address. “Ninety-eight-point-seven percent of high school teachers are rated effective. How can that be? How can 38 percent of the students be ready, but 98 percent of the teachers effective?”

While the numbers make for a powerful sound bite, the math isn’t sound.

According to the American Statistical Association, the relationship between student test scores and teacher effectiveness is not causative—meaning a bad grade isn’t necessarily caused by a bad teacher.

In a statement released last April, the ASA explained that the “value-added method” (VAM) of evaluating teachers are “generally based on standardized test scores and do not directly measure potential teacher contributions toward other student outcomes.” The statement went on to say, “Effects—positive or negative—attributed to a teacher may actually be caused by other factors that are not captured in the model.”

And possibly most damning: “Ranking teachers by their VAM scores can have unintended consequences that reduce quality.”

At least one member of the Board of Regents understands that this kind of testing has its limitations.

On The Capitol Pressroom radio show on Dec. 19, Regent Jim Tallon was asked if there is a direct connection between the number of kids who fail and the number of teachers who fail. He responded, “You’re asking would science make that correlation? I would say the science does not make that correlation.”

Regardless of its lack of value, Cuomo wants to double down on the testing portion of the rubric, demanding that New York’s Annual Professional Performance Review, or APPR, be revised so that 50 percent of a teacher’s evaluation is linked to the state tests.

Sean Corcoran, an associate professor of educational economics at New York University who studies teacher quality and effectiveness, is concerned about this kind of simplification.

“I think there is a somewhat naïve view that we can look toward test outcomes to find who those good teachers are,” Corcoran said. “(It) reflects a fundamental misunderstanding of how valid these measures are of a teacher’s influence on student outcomes.”

Is it possible to create a value-added model that accurately measures teacher effectiveness?

“I think the short answer is no,” responded Corcoran. “We’re pretty limited … in how we can account for differences across classrooms and student ability as well as outside influences on student achievement, including family background and community resources.”

But some evaluations are better than others. The key, says Corcoran, is “not to place too much of an emphasis on any one measure.”

When asked for examples of evaluation systems he considers to be among the best, Corcoran pointed to school districts in Washington, D.C. and Cincinnati.

“Both of those systems reflect a balance between different measures, although Washington, D.C. has put a pretty heavy weight on student outcomes,” Corcoran said. “Both … weigh multiple measures, and aim to provide constructive feedback to teachers.”

It should be noted that the Washington, D.C. teacher evaluations system, called IMPACT, recently changed its rubric.

Prior to the 2014-15 school year, the District of Columbia Public Schools assessment for general education teachers was based on four components, two of which included data from student test scores:

• 35 percent from “Individual Value-Added Student Achievement Data” (IVA), based on student test scores on standardized assessments

• 15 percent from “Teacher-Assessment Student Achievement Data” (TAS)—a measure of students’ learning over the course of a year by assessments other than standardized tests

But, according to the DCPS 2014-15 handbook, “due to the transition from the DC CAS (District of Columbia Comprehensive Assessment System) to the PARCC (Partnership for Assessment of Readiness for College and Careers) test, IVA will not be included as a component in final IMPACT scores for the 2014-2015 school year.”

In other words, the D.C. schools are accommodating teachers and students by delaying the impact of high-stakes evaluations until after a transition period from old to new assessments.

This isn’t unusual. Nationally, Tennessee reduced the weight of test scores and may again. New Jersey dropped the weight of its tests to 10 percent and may drop it again to zero.

This stands in stark contrast to Cuomo’s decision not delay the impact of high-stakes evaluations on teachers, as well as to increase rather than decrease the weight that student test scores have on those evaluations.

Still, the District of Columbia’s evaluation system has plenty of critics.

While a study by the National Bureau of Economic Research found that the district “shed many of its lowest performing teachers, kept its superstars and improved the quality of classroom instruction,” according to Politico, critics argue that student test scores didn’t budge. There were also allegations of widespread cheating.

In Cincinnati, the district’s Teacher Evaluation System (TES) is based on the Danielson method, a framework that divides teacher skills and responsibilities into four domains: planning and preparing for student learning; creating an environment for student learning; teaching for student learning; and professionalism. This system doesn’t use scores, but does include intense supervision.

Corcoran is quick to say that he wouldn’t hold up any single system as a model, but that Cincinnati has a more holistic and long-standing teacher evaluation system than others, with strong research behind it. “I like what I’ve seen there,” he said.

Cincinnati depends heavily on classroom observation, which Corcoran is a proponent of. Sometimes peers from different schools or master teachers do the observation in the system, which “seems to be working pretty well.”

Classroom observers pick up things that aren’t detected by test scores, explains Corcoran. “Also they are able to feedback the teachers immediately and give them advice on their practice,” he said, as opposed to test score-based measures that take a year to process. “And then you’re getting a percentile score that doesn’t tell you much about classroom performance, so it’s too little, too late.”

Cuomo’s new evaluation plan also includes independent classroom observations. This portion of the rubric has a few critics, but it’s nowhere near as controversial as relying on student test scores. Also, these concerns are limited to things like costs and control, not the underlying methodology.

For example, Michael Borges of the New York State Association of School Business Officials says the costs might be prohibitive. “We are concerned with the governor’s proposal for outside evaluators and who will pick up the cost of these evaluations, and whether this would be another unfunded mandate,” he said.

Another complaint, voiced by Rick Longhurst of the New York State Parent Teachers Association, is that using outside observers is “demeaning to principals and a threat to local control.”

But NYCAN’s Derrell Bradford is a proponent of the governor’s updated evaluation system.

“No evaluation system is perfect,” he wrote in an email. “The question is, is it better? I think people supporting the governor’s teacher evaluation reforms believe in the transformative power of teachers and great teaching deeply. We think great teaching beats poverty. And you have to measure things that are important to you and to society.”

But what disturbs many educators is the sense that they are being set up to fail.

Recently, Cuomo’s State Operations Director Jim Malatras sent a letter to the Board of Regents urging it to investigate teacher evaluations on Long Island.

At least one lawmaker wasn’t amused. “Everything that was ultimately advanced was approved by the State Education Department,” said state Sen. John Flanagan, the chair of the Senate Education Committee.

Karen Magee, president of New York State United Teachers, was more direct: “It is insane to once again scapegoat teachers for a process that the state controlled, reviewed and directed.”

Going back further, after the Common Core standards were rolled out, Dr. John King, then serving as the state education commissioner, publically noted that more failures should be expected.

In August 2013, the Daily News reported that King “warned principals that the results (of the new tests) could be disastrous, and suggested they use the scores ‘judiciously’ when making firing decisions.”

“So to use the intentionality of the policy—that we are expecting more kids to fail—and then to turn around and say, ‘Obviously the teachers are failing because the system performed as predicted,’ shows a dramatic misunderstanding of the education system,” Schenectady City Schools Superintendent Larry Spring said on the Capitol Pressroom.

Dr. Rick Timbs, executive director of the Statewide School Finance Consortium, recalled that it took decades to get the Regents tests “even close to reliable and valid,” and questioned the administration’s decision to shift to a more simplistic rubric—half of which has been deemed unreliable.

“Things that are innately complicated are innately complicated,” he said. “Teaching and learning are highly complicated.”

Summing up the thoughts of many observers, Timbs concluded that the teacher evaluation system “is not ready for primetime.”

Susan Arbetter (@sarbetter on Twitter) is the Emmy awardwinning news director for WCNY Syracuse PBS/NPR, and producer/ host of The Capitol Pressroom syndicated radio program.

NEXT STORY: Gottfried: New Study 'Dramatically' Changes Single-Payer Debate