Keen Research's Deep History in Voice Tech
VOICE Global 21-08-4 Modev Staff Writers 3 min read
Ojnjen Todic of Keen Research has been involved in the development of voice tech for more than 25 years. Before becoming CEO of the “on-device” speech recognition company, he founded multiple successful startups that were ultimately acquired by the “big boys” of the industry -- many of whom still use Todic's innovations to power their apps and user experiences today. He joined us at VOICE Global 2021 to discuss the benefits of on-device voice processing and some of the technology's growth opportunities in the future.
How ideas are formed
For Todic, the idea of on-device voice processing came about when one of his friends asked him if he could develop a simple voice-based app to help his daughter with the repetitive task of learning the multiplication tables. Todic knew that the app needed to process voice on-device. And, since this app was being developed for kids, data collection would not be allowed -- the Children's Online Privacy Protection Act (COPPA) is a law set in place to protect kids from data collection.
As Todic says, "So, I started playing with this, tinkering, creating some prototypes. And I quickly realized there was no good solution for on-device speech recognition on mobile devices. I also knew that there was a lot of progress in speech recognition technology with deep neural networks. So, I started diving deeper, and one thing led to another until I built an early beta SDK that was very kind of rough around the edges. But I started talking to people and started seeing enough interest to keep working on this."
The benefits of local vs. remote
But on-device voice processing makes sense for many reasons - not just for privacy, children's, or otherwise - though privacy is undoubtedly a compelling reason to adopt on-device processing. Todic takes us through the list of reasons why on-device voice processing is a win for the industry.
The first reason is privacy, as we already mentioned. The fact that all of the data stays on the device means there's no third-party "listening in" - it's just you and your device.
The second reason is that it works without an internet connection - it works offline. Off-device processing requires a constant internet connection. But sometimes, you don't have internet access, or access is spotty at best. In those situations, you either couldn't use your vocal assistant (no internet), or you'd have a degraded experience (spotty connectivity). With on-device processing, your vocal assistant is in your pocket, not in the cloud.
Then we get to the system architecture, which is much simpler with on-device processing. And it's much simpler because there's no dependency on the back-end or internet connectivity. If the processing occurs on a remote server, as the number of users grows, so must the server's throughput. And with sustained growth, this is never-ending. Not so with on-device processing. Plus, there can be no network delays because it's not dependent on connectivity, so the experience is always snappy.
Yet another advantage of on-device voice tech is customization. Cloud-based processing uses shared resources. When you introduce the possibility of customization, you start moving towards dedicated resources for specific features and customizations, which creates complexity and is more challenging to manage. When all of the resources are on-device, users can customize their app without impacting a back-end. Each user has dedicated resources that are entirely on their device.
Finally, we come to predictable pricing. Cloud services charge by usage, so the ultimate price paid is always a bit of a mystery because usage is difficult to predict. With on-device processing, the costs are fixed, regardless of use. So there are no surprises.
The future of voice tech is on-device
Turning to areas of potential growth for on-device voice processing, Todic makes the case that many avenues can lead to growth and cites a few examples. These span from education tech, where privacy is critical, to music production apps, where voice interfaces can allow musicians to make adjustments without putting down their instruments. He also mentions the automotive industry, where on-device processing can raise the quality of in-car vocal assistants. And, tied to all of these points is custom hardware, which can handle all of the number-crunching required for on-device voice processing. Many different devices can benefit from adopting the tech.
As Todic states, "There are some use cases where voice-processing can run only on-device because those devices don't have internet access. There are other use cases where it has to run in the cloud. That could be a watch that doesn't have a CPU that's powerful enough, and you've got to stream. But then there are a lot of devices in-between where you can do it either way. And it makes a lot of sense to do it on-device for all of the reasons I mentioned earlier."
Voice tech's future is bright (and on-device).
Missed VOICE Global live? Watch Ojnjen Todic discuss on-device voice tech, why it makes sense, and where it's headed at VOICE Global on-demand here. View the VOICE Global speaker lineup to see what other engaging speakers joined us across various industries.