Skip to main content
Voice-driven dedicated devices are present in nearly 20% of US households today. By 2022, this number is expected to reach 50%. This blog recaps Director of User Experience Michael Hinnant’s speaking session from SIC 2018 where he delved into how UX research should be tailored for voice-enabled technologies.

Last month, Filter’s Director of Experience Design, Michael Hinnant, spoke at Seattle Interactive Conference, the leading UX conference held annually in this city. The agenda was packed with speaking sessions from several other UX pioneers in tech, with subject matters like the evolution of content, universal accessibility, and design ethics.

Hinnant’s speaking session focused on the implications of voice-enabled and wearable devices on the UX industry – and you don’t need Alexa to tell you that there are several.

Voice-Enabled Assistants at Your Service

Voice-enabled devices are beginning to develop huge traction in the technology space. The proliferation of virtual assistants like Alexa, Siri, Cortana, and others is expected to increase as voice-enabled technologies become more embedded in day-to-day life. In fact, voice-driven dedicated devices are present in nearly 20% of US households today. By 2022, this number could reach 50%. (More on how new tech like Artificial intelligence is impacting UX here). This does not include smartphones, which currently dominate the voice-enabled device market.

While mobile has a massive installed base, the rise of dedicated devices and the addition of voice to existing devices will significantly increase the ways we will be able to engage technology with our voices. Consider that 70% of people surveyed by Forrester Research said that they prefer voice-driven interactions to traditional ones when doing day-to-day tasks. Users will adopt voice experiences if we create them well.

While voice-driven experiences are very different than traditional ones, many of our tried and true experience design strategies still apply. The same underlying foundations of UX design and research should be used – but the key difference is context. Consider the context in which the device is being used, the affordances and limitations, and the familiarity (or lack thereof) that the average user has with these types of interactions.

Context is Key

Who, what where, when, why – these questions all play a monumental role in developing a streamlined experience for users. After all, voice-enabled devices are used most frequently when our hands are busy; we’re driving, preparing meals, carrying bags, doing homework, or just don’t want to get up. Contextual Inquiry research gives us visibility into these contexts and how and when users might benefit from voice-driven experiences.

All users of voice-enabled technologies won’t be the same – they will be of different ages, from different countries, cities, and regions, and they will have different vernacular. Devices will be used in a variety of different contexts, from daily use to business applications like customer service, and even within the medical field. As such, the tone of the voice that users interact with needs to match the intent and need of the user for the product to be a success. In customer service, for instance, there’s an expectation that the virtual assistant will be efficient and clear. But a virtual nurse will need to show more compassion, as discussing health is a personal matter. Voice and tonality matter when interacting with people every day and we’ll have the same expectations of our virtual-assistant too.

From dialog to Dialogue

Virtual assistants can have a dialogue with users, but the conversational experience can make or break how the user feels about the experience overall. To understand conversation, you must do a lot of listening before you can start designing. People converse in different ways – formal tones that seem natural when teaching virtual assistants may not resonate with customers. Modern vocabulary, conventions, and verbal cues are the secret sauce to making an interaction sound natural and free-flowing.

Linguistics experts can tell you that terminology, rhythm, and tone can all make a conversation sound believable. This is especially important for specialized demographics because of the nature of the dialogue. The way words are said can infer different meaning based on how the virtual assistant says them. And to some, the words may mean nothing at all. For example, specific industries may have use-case-specific jargon; in the medical field, there are countless abbreviations and scientific terms that are commonplace, whereas they’d make a casual conversation a confusing mess. There are also generational differences to consider. A virtual assistant speaking to an elderly person will not be using the same verbiage as used with a millennial – as if!

Speaking without Words

When developing user experience, knowing the demographics of the target audience will help inform the parts of language beyond the words. Technology like Natural Language Understanding (NLU) and Natural Language Processing (NLP) are helping UX researchers and technologists develop content for voice-enabled devices. NLU reads between the lines, interpreting not just the words, but also the context and intent behind them. Even when words are misspelled or transposed incorrectly, NLU can correct and revise language to derive the intended meaning. NLP is a subdivision of NLU and “consists of software and algorithms that are capable of mining and analyzing unstructured information in order to understand human language within a specific context” (Source). For specialized populations or contexts, NLU and NLP can continuously improve the abilities of voice-enabled devices.

Another way that communication can be achieved, aside from language, is from technological feedback. It’s important for users to know when their devices are “thinking” – like when Alexa lights up to represent that she is loading her software. Different colors, noises, and vibrations all serve as replacements for the nonverbal cues we see and process in normal conversation. They’re also a fun replacement for verbal fillers like “um” or “you know.” Imagine if you were stuck on a thought, and you began to vibrate – in some ways these communication alternatives make dialogue more fun!

Speaking of language variation, one very important consideration is your target users’ native languages. Currently, most voice-enabled technologies are optimized for American English and native English speakers – with support very much American English-oriented as well. There’s a huge drop off when it comes to other languages, even Spanish, Chinese, Arabic, Hindi, and hundreds more. When developing UX for voice-enabled devices, we need to be aware that we’re not just designing for the American end-user. Millions of individuals speaking other English dialects and languages also have a need for voice-based technology. Where we’re at with English is a great start, but there’s massive room for improvement so that people of all ages and nationalities can experience the tech of the future.

Best Practices for Voice-Based Tech

When designing user experience for voice-based devices, there are a few key considerations to keep in mind:

Don’t Force Choice Paths

In an organic conversation, there’s never a set path for how the dialogue will go. Humans wander, interrupt, repeat, and forget. Rigid dialogues are not how people converse. How many times have you called your bank and felt frustrated when prompted to press 1 or 2, when you actually wanted an option 3 or 4?  Somone asking “How can I help you today?” is much more inviting and likely to mitigate the frustration that could arise otherwise.

Uses Prompts to Clarify Intention

Prompts and examples keep the user from guessing when interacting with your system. Users frequently shy away from new technologies from lack of knowledge and expectations. Your design should bridge that gap. For example, sharing that “You can say ‘billing’ or ‘payment’” to the user will help limit the uncertainty of choice, and better direct the conversation to where it needs to go.

Keep it Natural

It’s easy to design for straight paths, but the natural cadence of conversation is more involved. Be aware of how people take turns when talking; plan for pauses, time to think, and verbal fillers.

Share all Possibilities

Let people know the realm of possibilities available to them. For example, when you ask Siri a question, she often shares a few possible questions you can ask to achieve your desired result. This will cause users to engage more. Users don’t know what they don’t know.

Consider integrations

Your product and organization don’t have to solve every problem. Instead, look for ways to connect your solution to others to generate the most value for users. There are countless integrations and extensions that exist today to streamline customer experience and make your product more powerful. For example, IFTT exists to do just that, syncing activities from multiple websites, and now Alexa and Google Home, too. Did you misplace your phone in your house? IFTT allows you to ask Alexa to call your phone.

Watch for Cognitive Load

Information overload is the downfall of many great products. Cognitive load is about how hard the user’s brain must work to use your product. With voice-driven experiences, the user must remember more, as there is no display to help the user. There needs to be the right balance between information, options, and simplicity for the user to maximize how they interact with the product. One strategy to prevent overload is to chunk out info into bite-sized tidbits, so users aren’t overwhelmed. While useful for the average person, this also helps the elderly population and young children.

Rethinking User Experience

In design, the “happy path” (or the simplest, most direct engagement) is a great place to start. Once you’ve developed your voice-based system to a point where getting from A to B is a breeze, you can allow your system to wander. Even so, there must be a balance between offering one or two next steps and a dozen to simplify usage.

In human conversation, nonverbal cues convey lots of meaning that voice-based technology doesn’t have the option of communicating. During testing, blindfolding subjects helps with taking the considerations of these silent communication tactics out of the picture. You must also reframe how you think about user experience. Running into errors during testing provides an opportunity to expand the conversational space, rather than as a deterrent. When you think of errors as opportunities, the voice-based world becomes easier to navigate and design.

For more information on user experience, including voice-based design, virtual reality, and more, subscribe to the Filter Digest or read more on our User Experience webpage.