People who have challenges being understood when they speak due to stuttering or a neurological disease such as cerebral palsy – or even a heavy accent – will soon be able to use a new computer tool so they can be understood better in their own voice.
The software was developed by computer science professor Frank Rudzicz through his start-up company Thotra. It can run on a personal device such as a smartphone or computer, or it can operate in a cloud-based environment to facilitate telephone conversations. A user simply speaks into a microphone, and the software transforms their words, nearly instantaneously adding dropped sounds, enunciating vowels, and removing stutters and pauses, while maintaining everything else a person says.
“Some people have specialized keyboards or other devices to communicate, but they usually have a robotic voice that lacks inflection,” says Rudzicz, who is also appointed to the Toronto Rehabilitation Institute. “We preserve all their idiosyncrasies and tone. If someone is sarcastic or exuberant, that emotional content is carried through.”
For people with speaking challenges, Rudzicz’s software is a major improvement over typical voice-recognition software. For example, modern speech recognition, such as that used in iPhone’s Siri, can recognize only about 10 per cent of words spoken by someone with moderate-to-severe cerebral palsy. Rudzicz has previously built custom software that doubles this success rate, but he took an entirely new approach when developing Thotra, which provides nearly complete comprehension.
“We do a simpler version of speech recognition where we’re only interested in finding out, say, if this sound is a vowel or not. It’s easier to get accurate results. The transformation that removes stuttering, for example, is 99 per cent accurate.”
Rudzicz, who is now seeking funding and partners to conduct larger clinical trials, has so far tested the software with a small number of people. Annalu Waller, a professor at the University of Dundee in Scotland who has cerebral palsy, says she’s impressed with demonstrations of Rudzicz’s software. “The ability for people to use their own voice with Frank’s system instead of relying on a close friend or relative to interpret for them is amazing.” She also sees the software, which can run on any personal computing device, as a big leap from communication aids that require users to type or to remember complex codes to retrieve stored words. “Using one’s own voice is so much more intuitive and authentic,” she says.
Rudzicz is also looking at cloud-based uses of the software that would facilitate realtime telephone conversations. He believes call centres in countries such as India, Mexico and the Philippines that serve North American clients might be interested in his invention. “Softening accents represents a huge market,” Rudzicz says.
Still, Rudzicz says he is most excited about empowering people with cerebral palsy, ALS and other neurological disorders to free their voices. “My heart is in giving a voice to people who have a natural difficulty being understood,” he says.
A U of T lab is working with actors, writers and directors on how they could harness AI and other emerging technologies to generate new ideas and – just maybe – reinvent theatre