Can ‘Linguistic Fingerprinting’ Guard Against AI Cheating?

Since the sudden rise of ChatGPT and other AI chatbots, many teachers and professors have started using AI detectors to check their students’ work. The idea is that the detectors will catch if a student has had a robot do their work for them.

The approach is controversial, though, since these AI detectors have been shown to return false positives — asserting in some cases that text is AI-generated even when the student did all the work themselves without any chatbot assistance. The false positives seem to happen more frequently with students who don’t speak English as their first language.

So some instructors are trying a different approach to guard against AI cheating — one that borrows a page out of criminal investigations.

It’s called “linguistic fingerprinting,” where linguistic techniques are used to determine whether a text has been written by a specific person based on analysis of their previous writings. The technology, which is sometimes called “authorship identification,” helped catch Ted Kaczynski, the terrorist known as the Unabomber for his deadly series of mail bombs, when an analysis of Kaczynski’s 35,000-word anti-technology manifesto was matched to his previous writings to help identify him.

Mike Kentz is an early adopter of the idea of bringing this fingerprinting technique to the classroom, and he argues that the approach “flips the script” on the usual way to check for plagiarism or AI. He’s an English teacher at Benedictine Military School in Savannah, Georgia, and he also writes a newsletter about the issues AI raises in education.

Kentz shares his experience with the approach — and talks about the pros and cons — in this week’s EdSurge Podcast.

Hear the full story on this week’s episode. Listen on Apple Podcasts, Overcast, Spotify, or wherever you listen to podcasts, or use the player on this page. Or read a partial transcript below, lightly edited for clarity.

EdSurge: What is linguistic fingerprinting?

Mike Kentz: It's a lot like a regular fingerprint, except it has to do with the way that we write. And it's the idea that we each have a unique way of communicating that can be patterned, it can be tracked, it can be identified. If you have a known document written by somebody, you can kind of pattern their written fingerprint.

How is it being used in education?

If you have a document known to be written by a student, you can run a newer essay they turn in against the original fingerprint, and see whether or not the linguistic style matches the syntax, the word choice, and the lexical density. …

And there are tools that produce a report. And it's not saying, ‘Yes, this kid wrote this,’ or ‘No, the student did not write it.’ It's on a spectrum, and there's tons of vectors inside the system that are on a sort of pendulum. It's going to give you a percentage likelihood that the author of the first paper also wrote the second paper.

I understand that there was recently a time at your school when this approach came in handy. Can you share that?

The freshman science teacher came to me and said, ‘Hey, we got a student who produced a piece of writing that really doesn't sound like him. Do you have any other pieces of writing, so that I can compare and make sure that I'm not accusing him of something when he doesn't deserve it?’ And I said, ‘Yeah, sure.’

And we ran it through a [linguistic fingerprint tool] and it produced a report. The report confirmed what we thought that it was unlikely to have been written by that student.

The biology teacher went to the mother — and she didn’t even have to use the report — and said that it doesn’t seem like the student wrote it. And it turned out his mom wrote it for him, more or less. And so in this case it wasn’t AI, but the truth was just that he didn't write it.

Some critics of the idea have noted that a student’s writing should change as they learn, and therefore the fingerprint based on an earlier writing sample might no longer be accurate. Shouldn’t students’ writing change?

If you've ever taught middle school writing, which I have, or if you taught early high school writing, their writing does not change that much in eight months. Yes, it improves, hopefully. Yes, it gets better. But we are talking about a very sophisticated algorithm and so even though there are some great writing teachers out there, it's not going to change that much in eight months. And you can always run a new assignment to get a fresh “known document” of their writing later in the term.

Some people might worry that since this technique came from law enforcement, it has a kind of criminal justice vibe.

If I have a situation next year where I think a kid may have used AI, I am not going to immediately go do the fingerprinting process. That's not gonna be the first thing I do. I'll have a conversation with them first. Hopefully, there's enough trust there, and we can kind of figure it out. But this, I think, is just a nice sort of backup, just in case.

We do have a system of rewards and consequences in a school, and you have to have a system for enforcing rules and disciplining kids if they step out of line. For example, [many schools] have cameras in the hallways. I mean, we do that to make sure that we have documented evidence in case something goes down. We have all kinds of disciplinary measures that are backed up by mechanisms to make sure that that actually gets held up.

How optimistic are you that this and other approaches that you're experimenting with can work?

I think we're in for a very bumpy next five years or so, maybe even longer. I think the Department of Education or local governments need to establish AI literacy as a core competency in schools.

And we need to change our assessment strategies and change what we care about kids producing, and acknowledge that written work really isn't going to be it anymore. You know my new thing also is verbal communication. So when a kid finishes an essay, I'm doing it a lot more now where I'm saying, all right. Everybody's going to go up without their paper and just talk about their argument for three to five minutes, or whatever it may be, and your job is to verbally communicate what you were trying to argue and how you went about proving it. Because that's something AI can't do. So my optimism lies in rethinking assessment strategies.

My bigger fear is that there is going to be a breakdown of trust in the classroom.

I think schools are gonna have a big problem next year, where there's lots of conflicts between students and teachers where a student says, ‘Yeah, I used [AI], but it's still my work.’ and the teacher goes, ‘Any use is too much.’

Or what's too much and what's too little?

Because any teacher can tell you that it's a delicate balance. Classroom management is a delicate balance. You're always managing kids' emotions, and where they're at that day, and your own emotions, too. And you're trying to develop trust, and maintain trust and foster trust. We have to make sure this very delicate, beautiful, important thing doesn't fall to the ground and smash into a million pieces.

Listen to the full conversation on the EdSurge Podcast.