UTEP researchers aim to make AI voices sound warm and more natural
EL PASO, Texas (KVIA)-- A team of researchers at the University of Texas at El Paso has uncovered findings that could change how people interact with artificial intelligence, particularly through voice-based systems.
The research suggests that AI speech does not currently reflect how people naturally speak. In everyday conversation, humans often articulate words less precisely when expressing positive emotions, a nuance that most synthetic voices fail to capture.
The project began three years ago under the direction of Nigel Ward, a professor of computer science at UTEP. Ward said the goal is to create AI voices that better convey emotion and intent.
“Our hope is that future AI voices will be a lot warmer,” Ward said. “You’ll be able to feel how they actually feel about things.”
Javier Vazquez, a UTEP graduate and research assistant said the team uses a predictive model that determines how much speech reduction or imprecision should be introduced into synthesized audio to mimic natural dialogue without sacrificing clarity.
Those who participated in the study were placed in isolated rooms and recorded engaging in open dialogue, allowing researchers to separate and analyze individual voices. Hours of audio data were then hand-labeled by multiple reviewers to identify speech patterns and emotional cues.
“My part in that was to develop the model that will predict how much reduction we should include,” Vazquez said. “We don’t want it to sound really sloppy. If it’s unintelligible, that’s worse.”
Ward said the research has both advantages and risks. The project is funded in part by the U.S. Air Force, which is exploring applications such as improved communication between pilots and AI-controlled drones.
“You want to be able to very naturally understand the situation,” Ward said. “The plus is smoother communication and more trust.”
However, Ward cautioned that making AI voices more natural could also deepen dependence for people already addicted to artificial intelligence.
Despite the concerns, Vazquez said being part of the research has been rewarding.
“Once you hear the differences in what’s synthesized, there’s this moment of clarity,” he said. “As a researcher and a scientist, there’s pride in seeing that what you modeled and configured actually works and turns into something you can hear.”
