Voice recognition is an interesting technology. It’s been around for a long time, and it mostly keeps getting better. But does it ever get good? As far as I’m concerned, it doesn’t, particularly when it comes to cars.
Automakers have always been great at giving us futuristic features we don’t want. Talking cars were the new hotness in the 1980s, whispering jagged digital phrases like “OIL PRESSURE LOW” and telling us that a door was a jar. Nobody liked them, and they died a quick and quiet death. Did automakers learn from this? Of course not. Heck, just this past week, we heard Rivian’s chief software officer telling us “Ideally, You would want to interact with your car by voice.”
It’s a grand idea, and one that’s been floating around the auto industry for decades now. In theory, you should be able to tell your car what to do and everything would be wonderful. Except that’s not the case, and it never will be.
Slow, Annoying, Bad
These days, there are lots of shiny high-tech ways to interact with our vehicles. Touchscreens rule the roost; in some cars, they control almost everything. Meanwhile, voice commands are available in a great deal of cars these days, too. These let you control your car in a fancy modern way, and for some reason, that’s supposed to be better.
The problem is that it just isn’t. For most things, voice controls are much worse than doing things the old fashioned way. They’re slow, for one thing. First, you have to say a whole command or sentence out loud. Then the computer has to process it, potentially even using off-board cloud resources to do so. Then, after a painstaking delay, you get to find out if the computer even understood you and did what you wanted.
Compare this godawful experience with the magic of buttons. Press it, and whatever you want to happen just happened. Instantly. Done.
Buttons just work. For most tasks in a car, they do the job better than any other technology that has come along since. For that matter, so do old-school knobs, which are the best volume controls ever invented. They will not be bested before I leave this mortal coil.
For your consideration, another example. Give me a classic three-dial HVAC system and I can set it to whatever I want in a couple of seconds without even looking. I can set temperature, change the fan speed and set the vents in a snap. Even if I’ve never driven the car before, I can figure it out with a split-second glance.
Compare that to a voice control, which would have me blathering and guessing and just generally faffing about. Is the command “Temperature Up,” or do I have to say “Hello Mitsubishi, set temperature to 25 degrees.” What about vents? “Hey Mercedes, I want cold air on my face,” probably won’t work. It feels kind of demanding, too. It’s all too awkward.
You can read the manual and try and learn all the commands, but you’re still tangling with other problems. There’s no escaping the processing delay, which can be excruciating at times. Even worse is when the system doesn’t understand you. You get some annoying little beep noise prompting you to repeat yourself, or worse, a full explanation from a robot lady telling you to rephrase your request. My 2007 BMW was an absolute demon for this and all it could do was dial the phone for you.
Automakers can make voice controls faster, smarter, and more capable. It doesn’t matter. They’re never going to beat the instant response of my finger on a single button. Push-click-done will always be faster than “Hey Car Susan, can you please activate the combobulator?” whichever way you cut it.
An Exception
There is a limited use case where voice controls have some value. Consider, for example, entering an address into a GPS navigation system. It’s easy enough to type it in on your phone, but doing so on a car’s touch screen is usually an exercise in frustration. Being able to simply read out the address you’re looking for is far easier.
Even this use case comes with a caveat, though, and that’s accuracy. If you’re looking for Bone Street in Gainsville, Texas, you don’t want to accidentally end up navigating to Bone Street in Arlington, Tennessee. Or Boon Street, or Bong Street, or any other similar variation. You can end up fussing around quite a lot if your voice recognition system doesn’t get you the first time.
This also applies to music. I’ll never forgive Weezer for self-titling the vast majority of their albums. You can ask your car to play you the Blue Album, but you have to hope that the great minds at Gracenote programmed that colloquial title in, because officially, that album is just called Weezer. I know, I hate it too. I actually penned a lengthy explainer on how Gracenote tackled this annoying problem earlier this year, if you care to learn more.
Indeed, our own Peter Vieira sums it up perfectly:
The only thing I can think of is making music selections and nav. “Play Black Celebration,” “Direct me to the Costco in McKinney,” that sort of Alexa/Siri stuff.But actual car controls? No thank you. Even if I don’t needs to speak loudly/clearly, it feels like more work to mumble “AC on” than to just hit a button.
And he’s perfectly right.
Former Fan
Funnily enough, I used to be a lot more positive about voice recognition. In fact, I used to use voice controls all the time. Back in 2017, I had a Google Pixel smartphone and it worked great. Back then, I could rattle off voice commands and, nine times out of ten, get a decent result. “Hey Google, add peas to my shopping list,” I’d say, and it would work. Five or six times a day, I’d say “Hey Google, and a reminder Monday to do that work thing” and again, it would work, no problem. For a time, I could even say “Hey, Googz” like a true Australian, and it worked like a charm.
Somehow, around 2020 or so, that just… stopped working for me. I’ve had a number of other Google and Samsung smartphones and they’re… kind of okay. They only respond to my voice 60-70% of the time. That’s bad enough that I went from using these things every day to using them once or twice a week at most. I hate repeating myself, and I hate dealing with misinterpreted commands, so it stopped being worthwhile for me. I had an iPhone for a while, and Siri was pretty good. On the Android side, though, voice assistants are dead to me. Plus, talking to “Bixby” makes me sound like a tool. Fix it, Samsung.
Free Us
Ultimately, voice controls aren’t going away any time soon. They’re an increasingly common feature on modern cars. Meanwhile, the AI craze has seen automakers rush to integrate more advanced voice assistants to make sure they stay ahead of the curve.
Things can get better. Automakers can reduce processing delays and maybe even make systems that can reliably understand most people in most conditions. Regardless, it’s important to remember the basic truth. In the vast majority of scenarios, a simple button beats every other technology out there. Even asking my girlfriend to adjust the stereo takes longer than just tapping a few buttons myself, and she’s really smart! No voice control system will ever beat the button.
To that end, I wish we could just build cars with the best interfaces. End this flavor of the month crap, and just do what works. That’s all I ask. Stop trying to give us fancy voice controls, because the last three decades have shown they’re just not that bloody good.
Image credits: Lewin Day, Mercedes-Benz, BMW, Google, Mitsubishi
The voice system also falls apart when you are dealing with businesses with name puns and unique spellings. For some reason my voice nav has a terrible time with “Pick n’ Pull” and often when I ask to navigate to something like a fast food chain while I’m on the highway it’ll ignore the closest one and just automatically take me to one 20 minutes out of my way.
Isn’t it great how the technology was improving, stopped, then started going backward?
I share this sentiment on most of the voice control systems I’ve tried to use. There was a nice easter egg in my wife’s old MDX though. If you initiated the voice control and made a robust farting sound, it would turn the A/C on. Lots of fun with little kids in the car.
The problem with voice control interfaces is that *as humans* we often fail to understand each other. With that in mind, programming a computer to understand our nuances becomes a futile task.
It’s been around awhile:
In the early 1980s my brother bought a point and shoot film camera. Every now and then it would loudly say things like Toodah or Toolah. It was saying Too Dark or Too Light but with an Asian accent. Aside from being humorous to our ears, it also made it impossible to use discretely, which is the entire point of a small camera.
Mid 1990s I was product testing an alarm system that spoke … but English with a heavy Chinese accent. We did install one or two, but they came back, as customers could not bear hearing a human voice yelling “HELP! HELP! CALL POLICE!” emanating from their closet in the middle of the night.
Oh, yes, car voice controls (indeed, *any* voice controls) will always suck, especially for those of us who are deaf or hard of hearing. And it’s only going to get worse as we all get older; the population in the U.S. is getting older on average than ever with corresponding increases in the number of people with hearing problems.
Reminds me of how a couple decades ago I was at a big box hardware store with my two children, who were 10 and 7, and we couldn’t find what we were looking for so I managed to get hold of an employee (which was surprising in light of how understaffed the store was during an especially busy time) but as he was older and did not have his reading glasses he couldn’t read what I had written down on a notepad. So my older kid tried to tell him what we were looking for but it turned out that his hearing aid couldn’t pick up my kid’s voice (for those of you who may not know, many hearing aids, especially those used by older people, at least back then, aren’t always capable of picking up some higher pitched voices such as what most young children and even some adults, especially women, have.) So we all ended up having to track down another employee. It was pretty funny and we were all laughing about it but it also underscored the challenges faced by deaf people in dealing with hearing people *and* technology (such as the hearing aid that couldn’t pick up my kid’s voice.)
Another case of being SOL thanks to technology happened a couple years ago when I was cat-sitting for someone who had just adopted two very young kittens who were having a little trouble adjusting to being around humans; that person then unexpectedly had to go out of town for a week or two right after taking the kittens in so they asked me to spend as much time as possible at their house with the kittens to help with their socialization around humans. They had a premium cable and streaming package so I was hoping to be able to catch some films not available on DVD/Blu-ray (such as Glass Onion: A Knives Out Mystery which had just come out) but much to my chagrin and dismay I found that they had just acquired a brand new fancy TV with a “smart” remote that required voice commands to make any changes to the TV’s settings. No matter what I did with the TV set itself (its few physical buttons only provided the most cursory of access to a few menus) I couldn’t turn on the closed captioning; the voice-activated/controlled remote was simply useless for doing anything. So whenever the kittens were in hiding I just used my smartphone with my own streaming services, none of which had Glass Onion, and read books. At least whenever the kittens came out of hiding I could just chill with them and not worry about not having access to premium cable, ha. Upon the kittens’ human family returning they got a better remote that didn’t rely on voice commands so during one of my subsequent catsitting gigs I was able to watch Glass Onion (highly recommended!) even as the kittens got older and better accustomed to being around humans.
In any case, TL:DR, voice controls of any kind just simply suck and will always do so, bah humbug.
And, for those of you who haven’t yet seen it, one of the reasons the film Glass Onion is so highly recommended is quite apropos of this website in light of how many of us have some pretty harsh (quite deservedly so!!) criticisms of the ostensible head of the Tesla company: one of the main characters in the film is a hilariously damning portrayal of said CEO, the typical fictitious persons disclaimer in the end credits notwithstanding.
Some new cars do recognise natural speech patterns now, so saying “Hey car, I’m cold” would turn up the temperature.
That said, as an introvert who dislikes talking to people generally, the last thing I want to do is talk to my car.
I strongly believe in physical heater controls and a volume knob as absolute essentials, but there’s a lot of stuff where voice controls are at least competitive with the alternatives. Like, there’s not going to be a “Play The Blue Album” button installed*, so there’s inevitably some variation of hunting through a menu, either with a touchscreen (which sucks while driving), or with whatever variation of iDrive (more feedback, but certainly no quicker than voice control in most cases). I’m content with the level of buttonry in Mazdas (as a fanboy), but still use voice control for a few specific uses.
*Toyota probably got close with their Party Mode button, but that really would’ve been a Party Rock Mode button for the tens of people who want LMFAO on demand.
One only needs to look at the Grand Tour episode where the guys attempted to navigate to the Nürburgring using voice command, only to find that the nav had directed them to Nuremburg.
(At which time Richard Hammond suggested that they use a local race course to test their lap times, thus, as he said it, “Making it the first-ever Nuremburg Trials.”)
I think there’s Nürnberg, Nürburg, and Nüremberg. Not helping ????
I agreed with you from the headline. But the article is way too long. Until VOIP is perfect for answering a telephone call using it for directions for a driverless car is insane. Have you ever called a company and tried to conduct a business transaction with a VOIP? Has it ever worked? Of course not. Now figure this technology controlling a car at 65mph. Can you repeat that as a stop sign impels your forehead you dead.
You know who doesn’t like voice commands? Us deaf people (sorry, those of us with ‘profound hearing loss’). I don’t want to interact with a sense I can’t rely on, so if anything we can use the ADA to force car makers to keep buttons around. Or I guess they can just make it illegal to drive while deaf.
Exactly this. And mute people as well.
And for me touchscreens are horrible, and I include my phone as well. Many times I have attempted to just push a button and after a dozen hits (increasing in velocity) nothing happens. Same thing occasionally at stores and their point of sale boxes. I guess I barey exist. When I approach a self-check-only terminal I pre-warn the attendant that these things hate me and I reciprocate, they usually step up and give a much needed assist. Last time this happened, I was really glad for the assist as two items needed it’s theft warning thingy disarmed, and there is no way I would have noticed it and the alarm would have been blaring away with me ignoring it compeltely.
One of my coworkers always turns on the voice activated (instead of button activated) voice control in our company cars, and it drives me crazy.
Voice control often activates randomly when I’m just speaking with a passenger, and more than once has the car called someone I really don’t want to speak to.
It doesn’t help that the steering wheel buttons are less than intuitive, and I can’t seem to learn how to quickly abort the call that way.
This is not a difficult problem to solve. The problem is that the automakers pay engineers what they deem as market rate, and anyone who can spell AI and program in Python is going to make 3x that somewhere – anywhere – else. I have a team that did a limited NLP interpretation for a specialized application in like 3 months. It’s all about focus.
Contain the use case to a specific set of commands and input a very good speech-to-text NLP. Train the ever-living-crap out of that model with accents, background noise, and again a LIMITED set of commands (and to ignore ancillary stuff), it’ll work just fine.
That said, I prefer buttons and dials (and don’t even bother with my Rivian’s Alexa implementation), but being here on this site only confirms that we’re all a bunch of weirdos in the first place.
I can just see swearing in traffic, the next thing is the navigation saying, “to go to Jesus, pull into opposing traffic after disabling crash avoidance”
I have no interest in talking to my cars, and I have THREE of them that nominally do voice commands. I don’t have much interest in talking to most *people*.
Buttons and knobs. 100+ years of automotive evolution does not need to be chucked out because touchscreens are cheaper and people are stupid.
“Alexa – Lights off.”
“Okay – Here’s “Lights Off” by Jay Sean (or Tay Keith)(or Rajah Wild)(or We Are Domi)(or…)”
“Alexa – Stop”
*boop*
“Alexa – Turn off the Lights”
“Okay” *boop*
Now imagine this in your car – but instead you’re trying to turn on the wipers and the lights and close the roof in a sudden downpour while you’re in a low-signal area…
Just say no! Hopefully it will understand.
I have found that replying to texts through Android Auto in my Jeep is great. Even when it is wrong the first time, it is still safer and faster than if I try to text back through my phone. Other than that, I don’t think I use voice commands at all.
I also occasionally use voice commands to navigate to an address with Android Auto. It’s much more accurate than the SYNC system in my Ford.
To wit, I wish car companies would give up trying to reinvent the wheel that’s already been (mostly) perfected by Android Auto and Apple CarPlay. Just provide the screen/interface and call it a day.
The voice commands are even worse if you use more than one language. They just straight up suck if you try to use any other language other than english.
You’ll be shocked once the next generation of conversational audio-based AI agents makes it to the car. The newest stuff is patently amazing, and it’s improving by the week. One of the things my teams do is write AI voice agents and the polish we’re getting these days is ridiculous, especially with the realtime API stuff available via OpenAI. The rapid pace of change/improvement is both exhilarating and terrifying.
Unless these next gen AI agents are buttons that perform specific tasks when I press them without the need to talk, I will be less than impressed.
How do they perform with the other Slavic languages?
“Hey Mercedes, I want cold air on my face,”
Ms. Streeter has entered the chat
Yeah! If “[I]t feels kind of demanding, too. It’s all too awkward,” to Lewin, imagine how she feels being ordered to blow on his face. It should be in the workplace handbook that she gets to slap anybody who asks her for that.
Just give us buttons. For the love of god and everything holy, just put friggin’ buttons back in the car. Don’t make me have to figure out how I need to say the thing I want so the car thinks I said the thing I want, just give me a goddamn button to do the thing I want directly. The idea that speech would be faster, easier, or more accurate than a button would only occur to someone who doesn’t ever actually have to speak to people (or who’s high enough on the food chain that it’s everyone else who’s gotta adapt to them, not the other way around).
Honest to god, how is it possibly cheaper to build an entire friggin’ machine-learning-based speech UI than to just put some goddamn buttons in the car?
Even worse – *gestures*. There is only one gesture I am ever giving a car, and it involves a single digit.
But ultimately, I thank the auto industry. They are saving me a ton of money by removing all temptation for me to every buy another new car. It was fun while it lasted.
I’ve given that one to my car plenty, the last thing I need is for it to start responding.
ROFL!
I have yet to use voice controls for anything, it just seems like an annoyance.
I’ve told this story before, but I don’t think it has made it here yet. I had a dash cam with voice recognition for a while, that allowed you to save a video or take a still photo by saying “save video” or “take picture.”
Once, I was driving home from work, and someone pulled out of a 7-Eleven parking lot right in front of me. I honked the horn and shouted, “Hey, dipshit!”
The camera took a picture.
So I tried it again. “Hey dipshit.” Click. From then on, that was my preferred voice command to take a picture.
Strangely, my wife has the same model of camera in her car, and it won’t take a picture with either “take picture” or “hey dipshit.” But sometimes random song lyrics, or even guitar riffs, will set it off.
Yeah, I agree. Voice command sucks.
I agree with you about voice command—but having your camera take pic to Hey Dipshit is kinda awesome
I had to learn my hot phrases for Helmet coms on my motorcycle helmets. Im agnostic since I use Sena and Cardo, but I find myself using the voice commands on the cardo more.