How Does a Tool That Detects Cheating With ChatGPT Grapple With ‘

Artificial Intelligence

How Does a Tool That Detects Cheating With ChatGPT Grapple With ‘False Positives’?

July 11, 2023

Photo By Ground Picture/Shutterstock

William Quarterman, a student at the University of California at Davis, was accused of cheating. His professor said he’d used ChatGPT to take a history exam, the charge buttressed by GPTZero, one of the many new tools that have emerged to try to detect student use of generative AI systems.

Quarterman swore his innocence, though, and he was later let off the hook after he presented the log of changes he made in a Google Doc.

The case raises a point of contention over the use of algorithmic writing detectors — tests of the software have found a high percentage of “false positives” — and there are now examples of cases when accusations that students used AI turned out to be unsubstantiated or were later dropped.

Some chafe at the term false positive — arguing that, because flags raised by these detectors are meant to be used as a start of a conversation, not proof, the term can give the wrong impression. Academic integrity watchdogs also point out that a dismissal of cheating charges does not mean no misconduct occurred, only that it wasn’t proven. Google Docs may be an important tool in establishing authorship for students accused of plagiarism in the future, argues Derek Newton, author of the academic integrity newsletter The Cheat Sheet.

Regardless, the issue is on the radar of the detection services themselves.

In December, when EdSurge interviewed a leader at Turnitin, the California-based software developer that uses artificial intelligence to discern plagiarism in student assignments, the company had yet to bring its chatbot plagiarism detector to market. Still, argued the vice president of artificial intelligence Eric Wang, detection wasn’t going to be a problem. And the promised accuracy set it apart from previous detectors.

In practice, it’s proven to be a little thorny.

That’s partly because when tools detect that students have used AI to assist their work, instructors are unsure how to interpret that information or what they can do about it, according to Turnitin.

But part of the difficulty also seems to arise in cases when AI assistance is detected in smaller portions of the overall essay, the company acknowledged at the end of May, in its first public update since launching its detection tool. In cases where the technology detects that less than 20 percent of a document contains material written by AI, Turnitin says, it’s more prone to issuing false positives than previously believed. Company officials did not give a precise figure for the rise of false positives. From now on, the company says it will display an asterisk next to results when its tool detects that a document contains less than 20 percent of AI writing.

Still, the unease about inaccurate accusations gives instructors and administrators pause around AI writing detection. And even Wang of Turnitin told EdSurge in March that the traces the company is picking up on right now may not be as reliable down the road as the tech evolves.

But when EdSurge checked in with Wang recently to see if false positives have given Turnitin additional concern, he said that the phenomenon hasn’t, while stressing the reliability of the company’s results.

Trying to walk the tightrope between teaching the use of a large language model like ChatGPT as a valuable tool and avoiding cheating is new territory for education, Wang says — while also arguing that even as these tools evolve, they will remain testable.

Daniel Mollenkamp (@dtmollenkamp) is a reporter for EdSurge. He can be reached at daniel@edsurge.com.