Education

Why Computer-Scored Essays Could Eliminate The Need For Writing Tests

WUSF | By John O'Connor

Published March 31, 2014 at 11:27 AM EDT

A classroom chart explaining the differences between claims, claim evidence and commentary. Hillsborough County schools are teaching the Three Cs as the building blocks of student writing.

Education Commissioner Pam Stewart announced earlier this month she had chosen a new statewide test from the American Institutes for Research. The test will replace the math, reading and writing FCAT exams students now take. Click here to listen to StateImpact Florida's John O'Connor speaking with WUSF's Craig Kopp about what we know about the new test so far.

Florida’s plans to add computerized grading of its new statewide writing test could eventually eliminate the need for a writing test, advocates for the technology said.

Essays on Florida’s new writing test will be scored by a human and a computer, but the computer score will only matter if the score is significantly different from that of the human reviewer. If that happens, bid documents indicate the essay will be scored by another human reviewer.

University of Akron researcher Mark Shermis has studied the accuracy of automated essay scoring — computer programs which read essays and assign a score – in three trials. Shermis concluded the programs worked at least as well as human scorers in two of those trials.

An Australian trial of two automated essay scoring programs found machine-scored essays fell short of human grading on closed content driven writing prompts. But that trial used just one prompt and a small sample of essays.

A second trial, sponsored by the William and Flora Hewlett Foundation, tested eight commercial automated essay scoring programs and one developed by a university lab. the trial gathered more than 22,000 essays from eight writing prompts spread across six states.

The nine automated essay scoring programs performed on par with human scorers. The humans earned an accuracy score of .74, while the best of the automated essay scoring programs earned an accuracy score of .78. The machines scored particularly well on two data sets which included shorter, source-based essays.

“A few of them actually did better than human raters,” Shermis said.

A third competition, this time public and online, drew 159 teams from around the world. Again, the top automated essay scoring programs performed as well or better than human graders.

Shermis concluded automated essay scoring can be used for high-stakes writing exams if used as a second reader to verify human scorers. States also must study the results to make sure the results show no bias against any particular group of students, he said.

Long-term, Shermis said automated essay scoring could eliminate the need for a writing test. Every piece of student writing could be fed into the program so teachers and schools are constantly updating their evaluation of a student’s abilities.

Shermis said the programs potentially could flag possible threats to self or others in a student’s writing also.

“We believe it will take some time for people to get used to the technology,” Shermis said.

Utah associate superintendent Judy Park said her state has gotten used to the technology. Utah has used automated essay scoring since 2010. The state recently signed a contract with the American Institutes for Research — the company designing Florida’s next test — to provide automated essay scoring.

“What we have found is the machines score probably more consistently than human scorers,” Park said. “We really have been pleased with the results.”

A handful of states use automated essay scoring for state tests. Louisiana uses it for end-of-course exams in Algebra I, Geometry, English II, English III, Biology, and U.S. History, said spokesman Barry Landry.

Utah has had no problems with the automated essay scoring, but Park said it is important to maintain human scoring in order to calibrate the computer programs.

One concern of computer grading is that students will learn what the programs value and then write essays tailored to that. Park said Utah has seen no evidence that students can “game” the computer grading any more than they could with human scorers using a well-publicized rubric to evaluate writing tests.

Those worries have faded since Utah adopted automated essay scoring.

“We have not had those concerns in the past years,” she said.