Do Chatbot Tutors Work Better When They're Upbeat — and Female?

Since the sudden arrival of ChatGPT just a few months ago, there’s renewed interest in using AI chatbots as tutors. The tech itself raises a host of challenging questions. Some researchers are exploring one that might sound trivial but actually could be quite thorny: What should these computer-generated educational assistants look and sound like?

It turns out, one of the world’s most-cited educational researchers, Richard Mayer, is working on a series of studies looking at what kind of computer-generated voices and images are most engaging to learners and lead to the best outcomes.

“We’ve been dreaming in the field of education that people could have their own individual tutors to help them learn,” says Mayer, who is a professor of psychology at the University of California at Santa Barbara, noting that ChatGPT has renewed this push. “So as we get to that point, we should understand, ‘What should the characteristics be of those tutors?’ ‘How do we create online tutors who are approachable and we want to learn from?’”

One of those studies by Mayer appeared in a research journal just last month, titled “Role of emotional tone and gender of computer-generated voices in multimedia lessons.” The paper describes an experiment in which college students, some male and some female, each watched a short online slide presentation narrated by a computer-generated voice. All participants saw the same slides, but with one of four different voices: “happy female, sad female, happy male, and sad male.”

Mayer believes that as the computer-generated voices get more lifelike, the impact of voice tone, “gender” and other features will become more significant. One hypothesis is that students will respond better to an upbeat personality than a downbeat one, which falls into what Mayer calls the “positivity principle.” And that did happen for male participants in the study, who did better on a post-video quiz about the material when a happy voice delivered the material than when a sad one did.

Not only does Mayer think that upbeat virtual tutors will work better than those with other emotional tones, but he thinks that some students might learn better from an optimized agent than from a human tutor.

He points to research that shows that some students learn better from male instructors, while others learn better from female ones. And he suggests that in the future students may be able to choose the “gender” and “race” of the interactive agent delivering a lecture or serving as their AI tutor, much as people today can select the gender and accent of the Siri assistant on their iPhone.

For the next step in his research, Mayer has hired students from the theater department to help design interactive agents to further test his theory. “Once we can find the characteristics of the most socially appealing instructor, then we can use that for any lecture or presentation or instructional interaction,” he adds, noting that he is watching the development of ChatGPT and other agents closely as companies try to add voices and images to them for applications like virtual tutors. He says initial results show that students responded more positively to computer-generated instructors that read as female.

Concerns About Simulating Race and Gender

To some experts in computing and in teaching, the line of research raises eyebrows.

“My worry with this sort of hyper-customizations of tutors is it might result in a bad approximation and end up enforcing stereotypes that just aren’t true,” says Parth Sarin, a graduate student in computer science at Stanford University.

For example, Sarin grew up with parents who spoke a mix of Hindi and English, which AI models largely trained on standard American English may have trouble emulating.

“The people who use the AI models shouldn’t be trying to approximate identities that are very different from their own,” Sarin says. Sarin compared a white professor having a computer agent deliver their lecture video in a “Black voice” to a performer in blackface.

Regarding gender, there is a long history of robots being programmed with female-sounding voices. It’s a trend that some observers critique as reinforcing gender biases, especially considering the relative dearth of women involved in creating these kinds of tech tools. Yet when it comes to education, a preponderance of tutoring tools that “sound female” would reflect the reality that three-quarters of public school teachers in the U.S. are women.

One possible solution? Devising a “genderless” virtual voice. That’s the thinking behind Q, a voice assistant built using modulated recordings of people who identify as nonbinary.

Is Authenticity Essential?

To Derek Bruff, a visiting associate director at Center for Excellence in Teaching and Learning at the University of Mississippi, the push to create an ideal personality for a digital tutor reminds him of a previous moment in online learning. About 10 years ago, when big-name colleges were rushing to put out free online courses known as MOOCs, some proponents considered having Hollywood celebrities deliver them. “People were imagining that we would have a professor script a video but have Matt Damon or Morgan Freeman narrate the lecture,” says Bruff.

That trend never materialized, he adds, largely because for many students, the relationship with the professor delivering the material is key, regardless of the instructor’s speaking tone, gender or race.

“For some students, not having a personal relationship with their professor is not a problem — that tends to be older students and people already in the workforce,” Bruff adds. “But most undergraduate students, particularly beginning undergraduate students, do benefit greatly by having a relationship with their professor.”

The arrival of ChatGPT and the idea of virtual tutors, though, does raise the possibility that the technology may be able to effectively supplement a human professor, Bruff says. But he hopes that such tools are used like textbooks or instructional materials, not as replacements for human instructors.

“If I had the choice between figuring out what face and voice and tone to give instructional agents, and giving 30 students an actual teacher, I would give students an actual teacher,” he says.

The bigger question, according to Sarin, is whether an AI agent can ever form an effective teaching connection with a student.

“It’s sort of impossible to make a chatbot representation of a voice be authentic, because it’s a computer,” says Sarin. “Students can clue into the authenticity of teachers.”