Blog

Talking Back: The Evolution of Voice Assistants Hinges on User Experiences

The increasing popularity of voice assistants, such as Alexa and Siri, underscores the massive and exciting tectonic shift happening in voice tech today, but this voice-first revolution is not without its challenges. In order for voice assistants to provide long-term value to consumers, we must understand the complexities of voice user experience and the challenges that come with evolving voice assistant technology. With that, we’re launching part two of our three-part series on the future of voice tech.

Did you miss the first blog? Read it here.

It’s clear that voice assistants are driving a resurgence in consumer interest and investment in voice tech. In order for voice assistants to truly meet their full potential, we must understand their current limitations. More specifically, we need to focus on how the user experience around voice-first devices needs to evolve in order to achieve mass adoption. That all starts with designing intuitive, anthropomorphic user interfaces that can understand human speech nuances and communicate back to users in a way that makes us feel like they’re a close-friend rather than an intangible presence made from code.

Talking Back: The Evolution of Voice Assistants Hinges on User Experiences
Whitepaper: Voice is Back: Exploring the UX Jungle

Voice Assistants: It’s All About Context

Interactive voice systems have technically been around as far back as the 1950s, but the unprecedented popularity of voice assistants like Google Home and Amazon Echo foretell a massive technological shift. The challenge is that voice-first devices aren’t like other popular consumer tech devices. Consumers use voice assistants in specific locations, usually while multi-tasking, and can either be alone or amongst a group of people when using them. This creates an unprecedented variety of contextual factors dictating what actions people want their devices to perform, how they ask them to do so and their expectations for how these devices respond.

In order for voice assistants to evolve to the point where they can seamlessly perform what users are asking of them in an efficient and accurate manner, these contextual factors must be taken into great consideration when designing user interfaces. People interact with voice assistants much differently than other tech devices—it’s a much more interpersonal communication style when talking with Siri or Alexa, almost as if exchanging dialogue with another person. To meet a user’s expectations and demands, a voice-first device must have a user interface that’s governed by how humans communicate. Creating this sort of user experience is where the trouble lies.

Talking Back: The Evolution of Voice Assistants Hinges on User Experiences
Whitepaper: Voice is Back: Exploring the UX Jungle

Talking That Talk with Voice Assistants

A major voice-first pain point is talking to a voice assistant like an actual human and having it respond and perform exactly as a user desires. This challenge deals with Natural Language Processing (NLP). The ways in which humans conversationally transmit information and data through voice is linguistically complex. Think of things like abbreviations, slang, idioms, onomatopoeia…these uniquely human constructs can be difficult for voice-first devices to translate and process.

Many voice assistants are actively listening to your conversations when not in use and employing machine learning to deepen their lexicon and widen their response syntax to achieve better results. Still, context is everything. Successfully recognizing a voice input doesn’t necessarily mean a Siri or Alexa has truly understood your intent. That’s because services like Siri can’t intuitively decipher tone or emotion, meaning the results it gives can be skewed. It’s not just how you say it, but also the way you say it that can stump voice devices.

Voice-controlled devices may be able to talk the talk, but they’ve yet to learn how to walk the walk…metaphorically speaking, of course. Until UI designers can incorporate idiomatic expressions, linguistic peculiarities and human emotion that infer or allude to actual meaning when building voice devices, users will need to keep their requests and the way they phrase them simple, basic and direct. This way, voice-controlled devices won’t falter due to linguistic ambiguity or ignore vocal signals they’re not equipped to identify yet. For voice-first devices to truly revolutionize how we interact with the world around us, they’ll need to be able to seamlessly chat with you like a friend or co-worker would.

Want to learn more? Check out Sutherland Digital president, Andy Zimmerman's, latest whitepaper, "The Voice Everywhere Dilemma".

Talking Back: The Evolution of Voice Assistants Hinges on User Experiences