Over the past two decades, as new vehicles have been infected with the phenomenon known as creeping featuritis, it has been widely hoped that voice control would be the panacea for dashboards covered in buttons and switches. Alas, technical realities mean that has not worked out quite as planned. As the automotive industry heads into the 2020s, there is newfound hope that the power of voice will finally enable drivers to use the features they want while keeping hands on the wheel and eyes on the road.
What went wrong?
While there were experiments with speech recognition in the vehicle going back decades, the first real mass application that kicked off the modern era was the 2001 BMW 7 Series that debuted iDrive. In addition to the central rotary controller, voice recognition was a key element of the human machine interface (HMI). The challenge that afflicted iDrive and other systems that followed was that rather than reducing distraction for the driver, voice recognition often made it worse because it just didn’t work very well.
While the vehicle cabin might seem like a great place to use voice recognition, it is in fact one of the toughest places to implement it. Anyone who has tried to hold a conversation in a noisy environment like a bar knows how hard it is to pull out speech. Ambient noise levels while driving make it challenging to accurately record what the driver has said, and if the words cannot be heard, the meaning cannot be interpreted.
While the vehicle cabin might seem like a great place to use voice recognition, it is in fact one of the toughest places to implement it
Even if you can hear, understanding human speech is very computationally expensive. Given the lead time in developing a vehicle and harsh environmental conditions, automotive grade processors are rarely state of the art and never more powerful than absolutely necessary.
In order to maximise the probability of accurately detecting a word, early systems used very limited vocabularies so that anything close would be binned into one of the known words. They typically also only recognised a single word, so you had to go through multiple steps to enter an address in navigation—state, city, street, number.
The result was frustration, as customers could not remember the specific commands required or the system simply misidentified words. Companies such as Ford saw their quality ratings tank due to customer irritation with these and other infotainment technologies.
What has changed?
Enter the era of the digital voice assistant. Things finally started to look up after the 2014 introduction of the Amazon Echo. As a connected device in the home, Echo could take advantage of a key capability that iDrive and other systems largely lacked, namely the cloud. While Amazon’s Alexa voice services were preceded by several years by Apple’s Siri, the Apple solution was often as frustrating as early iDrive. Nonetheless, Alexa, Siri, Google Assistant and Nuance have all leveraged the power of massive data centres and nearly ubiquitous broadband connections to provide more accurate word recognition than had been possible when limited to in-car computing. Audio recordings of command strings are sent to data centres to be interpreted.
By leveraging bits of data such as the user’s past history, links to online services, location and heading of the vehicle, navigation route and more, digital assistants can now get crucial context about what the user might want
More importantly, those banks of servers were not just recognising individual words, but increasingly deriving semantic meaning from strings of words. Rather than recognising a couple of dozen specific words or phrases, cloud-based systems could now recognise almost any word and from the context of the surrounding words, come closer to understanding the user’s intent. None of these are yet capable of true natural language processing, but they are getting closer all the time.
Becoming a true assistant
By leveraging bits of data such as the user’s past history, links to online services, location and heading of the vehicle, navigation route and more, digital assistants can now get crucial context about what the user might want. For example, if someone is driving from Ann Arbor to Detroit and needs fuel or a charging station, they can ask for fuel and expect to get a list of gas stations on their route or find parking near their destination.
All of this is made possible thanks to the increasing penetration of faster wireless communications built into vehicles, more powerful, yet power efficient computing and better microphones. Array microphones similar to those used by Amazon’s devices are now finding their way into cars in combination with improved noise vibration and harshness characteristics to better capture what both drivers and occupants are saying. Array microphones can even distinguish who is speaking and prioritise as needed.
The improved compute can help further separate voice commands from ambient sounds including the audio system. With audio systems now being digitally controlled in most cases, the infotainment computer knows what is being played and can subtract it to isolate the voice commands allowing control of systems like media playback.
While connectivity is crucial, as much as wireless carriers like to tout their coverage it is far from complete, especially in rural areas. Thus, that compute is required to provide some local processing as a back-up. This local voice processing is typically more limited than what is possible in the cloud, but still far better than more primitive systems used to be.
When cell towers are present though, increasingly vehicles can leverage them. General Motors, the pioneer in embedded telematics, having introduced OnStar in 1996 has built in the hardware into virtually all of its vehicles for more than a decade and currently includes a 4G LTE modem in every new product coming from its factories. Last year, Ford also committed to 100% LTE penetration in North America by the end of 2019 and globally within a couple of years after that. Most premium brands are already at 100% penetration and mainstream brands are quickly moving in that direction. Navigant Research projects that more than 90% of new vehicles will have built-in data connections by the mid-2020s. The vast majority will still be LTE in that timeframe, but 5G in the car should start rolling out by about 2022 and become common by the end of the decade.
Voice assistants are projected to be embedded in nearly 90% of new vehicles sold globally by 2028. Amazon, Google, Nuance, IBM and other vendors are all pushing hard to become the default assistant
As most new vehicles have increasingly high levels of quality and performance, it becomes more difficult for manufacturers to differentiate themselves. The market is also becoming increasingly saturated and sales growth will slow and eventually plateau. Digital voice assistants provide a way for manufacturers to improve user experience, something that can still be a product differentiator and provide a platform for new service and revenue opportunities based on the requests being made by drivers.
Voice assistants are projected to be embedded in nearly 90% of new vehicles sold globally by 2028. Amazon, Google, Nuance, IBM and other vendors are all pushing hard to become the default assistant. A key factor to success is likely to be how well the in-vehicle systems integrate into other ecosystems like smartphones and home automation. At the moment, Amazon and Google would seem to be in pole position because of their relative strengths in these other areas, but penetration of those technologies still remains low in the larger market, and there may still be an opportunity for other players to gain a foothold—although they will have to act quickly.
This article appeared in Automotive World’s October 2019 report, ‘Special report: The rise of the in-car digital assistant’, which is available now to download