On one hand, “smart speakers,” like the ones that host Alexa and Siri, are incredible pieces of technology. Even if the environment isn’t perfectly silent, they can understand our unique voices and accents and deliver a wide range of services, from reporting the day’s top headlines to adding pasta to the grocery to-do list. Especially because of how easily they facilitate multi-tasking, smart speakers have been adopted faster than any new technology in human history — even faster than smartphones.
On the other hand, though, this technology is disappointing. While voice is the easiest, most natural, and most convenient way for us to communicate to our devices, it’s not how we want them to respond to us. Absorbing words through listening takes twice as long as reading, which explains why we love asking Alexa questions but rarely enjoy waiting for the answer. For Tobias Dengel, president of WillowTree, a global leader in digital product design and development, this dynamic underscores a fundamental truth of the voice industry — it’s just getting started, and it has a long way to grow.
In his new book, The Sound of the Future: The Coming Age of Voice Technology, Dengel explores the many ways that voice technology promises to change the ways we eat, compute, travel, do business, and get healthcare.
- “A tech company that happens to sell pizza.”
Say you want two pizzas with two toppings on each. If you use the Domino’s app, that order should take between 45 seconds and a minute . That’s quick, but placing it using AI-powered voice technology would take about 10 seconds, more than four times as fast. While that difference may seem relatively insignificant to you, the pizza orderer, fast food companies feel differently. Domino’s is so invested in AI research that the former CEO called the enterprise “a tech company that happens to sell pizza.”
Domino’s isn’t alone. In 2019, McDonald’s acquired a start-up working on voice-based ordering in multiple languages, which has helped cut ordering time by 30 seconds. Meanwhile, Applebee’s and Marco’s Pizza provide ways for customers to speak their orders, which are then converted into text messages for delivery to the kitchen staff, reducing the risk that their waiter will mistranscribe what they want. Instead of replacing human waitstaff with Wall-E-esque robots, voice technology could lighten their burden and allow them to focus on the more emotional parts of the job.
- “My voice is my password.”
The pattern is predictable. A company worries that it will experience a cyberattack, so it implements new layers of security, and the workforce is asked to bear the burden. Scared by the higher-ups, employees conform to the new protocols at first. But then, they start to burn out on the two-factor authentications and multiple passwords. They revert back to their old ways — or even worse, turn off the security features altogether — and the whole project is a bust. Enter: the power of voice.
When account owners at Charles Schwab want to access their account, they say “At Schwab, my voice is my password.” A proprietary algorithm analyzes 100 different qualities of the sound byte, from pitch to pronunciation, to provide verification without personal questions or PINs. This technology is especially helpful for an at-home workforce that’s distributed across the country (or globe), transforming security from an intrusive, inconvenient slog to an easy, reliable process.
- “We call her Rita.”
It seems almost archaic. A pilot receives information from ground control — beware of the storm up ahead, circle once before landing your jet — and then she has to input it manually into the plane. Instead, what if she could simply say the command to the computer? Or, even better, what if the computer heard the instruction directly from ground control and then executed it (after it confirmed with the pilot first, of course)? That kind of interaction could save time, especially for tedious tasks involving basic, repetitive commands. It could also potentially save lives, especially if you’re talking about a military plane.
So far, the Pentagon has invested at least $12.3 million to develop voice-based AI that can handle “high-intensity air conflicts.” The Russian military is on the case as well. They’ve developed “Rita,” a voice assistant that offers advice in high-pressure situations. Unlike even the most stoic air traffic controller, Rita stays calm no matter how high the stakes, according to one Russian pilot. “Her voice remains pleasant and calm even if fire hits the engine,” he says. She also provides “hints” during combat.
- “Patient notes”
Working in health care isn’t easy, particularly because of the long hours and grueling work. Just like with the other industries, voice technology can help here, too: allowing nurses and doctors to get immediate access to patient data, medication orders, alerts, and other crucial information. And because the technology is hands-free, providers can focus on interacting with patients rather than staring at a screen. However, there’s even greater potential here, especially when it comes to the unpredictability of health care.
By monitoring the ambient noise in a hospital, voice technology can offer insight that would otherwise be impossible. For example, noises in a patient’s room could be analyzed by AI to make sure everything is okay. If the baseline sounds changed — because of, say, a fall or a respiratory emergency or a cardiac event — nurses and doctors could be automatically notified. Or, in the case of a provider who may be dealing with a potentially violent patient, a security team could be dispatched to the scene. These kinds of protocols could fundamentally change how you experience the hospital.
- “Translate to from English to Swahili.”
While voice technology is largely considered a luxury in the U.S., it could be fundamental to including more people in the world’s economy — especially the billions of people, including millions of Americans, who are not fully literate. One simple example is tagging data to images, which could help build an online product catalog or train a data set for AI researchers. This low-skill, manual work is predominantly done in countries like India but is unavailable to those below a certain level of literacy. Voice technology would eliminate that barrier, and if it were paired with translation software, millions of people would have a greater opportunity of escaping poverty.
The same could be said for people with certain disabilities — not just those with print disabilities that make conventional reading and typing more challenging, but also learning disabilities like dyslexia and ADHD. There are even tools for those suffering from paralysis. In 2022, Google released its Look to Speak app, which uses the Android phone’s camera and speaker to enable the user to select and pronounce a phrase they’d like to say. In this way, voice technology has the potential to dramatically increase who is considered readily “employable.”