Last month, Filter’s Director of Experience Design, Michael Hinnant, spoke at Seattle Interactive Conference, the leading UX conference held annually in this city. The agenda was packed by talks given by several other UX pioneers in the technology industry, leading discussions on the evolution of content, universal accessibility, and design ethics, among other topics.

Hinnant’s speaking session focused on the implications of voice-enabled and wearable devices on the UX industry – and you don’t need Alexa to tell you that there are several.

Voice-Enabled Assistants at Your Service

Voice-enabled devices are beginning to develop huge traction in the technology space. The proliferation of virtual assistants like Alexa, Siri, Cortana, and others is only expected to increase as voice-enabled technologies become more embedded in day-to-day life. In fact, voice-driven dedicated devices are present in nearly 20% of US households today. By 2022, this number is expected to reach 50%. (More on how new tech like Artificial intelligence is impacting UX here). This does not include smartphones, which currently dominate the voice-enabled device market. While mobile has a massive installed base, the rise of dedicated devices along with the addition of voice to a whole range of existing devices will significantly increase the places where we will be able to engage technology with our voice. When doing many day-to-day tasks, consider that 70% of people surveyed by Forrester Research said that they prefer voice-driven interactions to traditional ones. Users will adopt voice experiences, if we create them well.

While voice-driven experiences are very different than traditional ones, many of our tried and true experience design strategies still apply. The same underlying foundations of UX design and research need to be considered – but the key difference is context. The context in which the device is being used, the affordances and limitations, and the familiarity, or lack thereof, the average user has with these types of interactions.

Context is Key

Who, what where, when, why – these questions all play a monumental role in developing a streamlined experience for users. After all, voice-enabled devices are being used most frequently when our hands our busy; we’re driving, preparing meals, carrying bags, doing homework, or just don’t want to get up. Contextual Inquiry research gives us visibility into these contexts and how and when users might benefit from voice-driven experiences.

All users of voice-enabled technologies won’t be the same – they will be of different ages, from different countries, cities, and regions, and they will have different vernacular. Devices will be used in a variety of different contexts, from daily use, to business applications like customer service, and even within the medical field. As such, the tone of the voice that users interact with needs to match the intent and need of the user for the product to be a success. In customer service, for instance, there’s an expectation that the virtual assistant will be efficient and clear. But a virtual nurse will need to have more compassion, as discussing health with any being – robotic or not – is a personal matter. Just as voice and tone matter when interacting with people you experience every day, it’s also expected that your virtual-assistant will understand that tonal shift too.

From dialog to Dialogue

Virtual assistants can have a dialogue with users, but the conversational experience can make or break how the user feels about the experience overall. To understand conversation, you must do a lot of listening before you can start designing. People converse in different ways – formal tones that seem natural when teaching virtual assistants may not resonate with customers. Modern vocabulary, conventions, and verbal cues are the secret sauce to making an interaction sound natural and free-flowing.

Linguistics experts can tell you that terminology, rhythm, and tone can all make a conversation sound believable. This is especially important for specialized demographics because of the nature of the dialogue. The words that are spoken can differ in meaning based on how the virtual assistant says them. And to some, the words may mean nothing at all. For example, specific industries may have use-case-specific jargon; in the medical field, there are countless abbreviations and scientific terms that are commonplace, whereas they’d make a casual conversation a confusing mess. There are also generational differences to consider. A virtual assistant speaking to an elderly person will not be using the same verbiage as used with a millennial – as if!

Speaking without Words

When developing user experience, knowing the demographics of the target audience will help inform the parts of language beyond the words. Technology like Natural Language Understanding (NLU) and Natural Language Processing (NLP) is helping UX researchers and technologists develop content for voice-enabled devices  NLU reads between the lines, interpreting not just the words, but also the context and intent behind them. Even when words are misspelled or transposed incorrectly, NLU can correct and revise language to derive the intended meaning. NLP is a subdivision of NLU and “consists of software and algorithms that are capable of mining and analyzing unstructured information in order to understand human language within a specific context” (Source). For specialized populations or contexts, NLU and NLP can continuously improve the abilities of voice-enabled devices.

Another way that communication can be achieved, aside from language, is from technological feedback. It’s important for users to know when their devices are “thinking” – like when Alexa lights up to represent that she is loading her software. Different colors, noises, and vibrations all serve as replacements for the nonverbal cues we see and process in normal conversation. They also serve as a more fun replacement for verbal fillers like “um” or “you know.” Imagine if you were stuck on a thought, and you began to vibrate – in some ways these communication alternatives make dialogue more fun!

Speaking of language variation, one very important consideration is your target users’ native languages. Currently, the majority of voice-enabled technologies are optimized for American English and for native English speakers – with support very much American English-oriented as well. There’s a huge drop off when it comes to other languages, even major ones like Spanish, Chinese, Arabic, Hindi, and hundreds more. When developing UX for voice-enabled devices, we need to be aware that we’re not just designing for the American end-user. Millions of individuals speaking other English dialects and languages also have a need for voice-based technology. Where we’re at with English is a great start, but there’s massive room for improvement so that people of all ages and nationalities can experience the tech of the future.

Best Practices for Voice-Based Tech

When designing user experience for voice-based devices, there are a few key considerations to keep in mind:

Don’t Force Choice Paths

In an organic conversation, there’s never a set path for how the dialogue will go. Humans wander, interrupt, repeat, and forget. Rigid dialogues are not how people converse. How many times have you called your bank and felt frustrated when prompted to press 1 or 2, when you actually wanted an option 3 or 4?  Being asked “How can I help you today?” is much more inviting and likely to mitigate the frustration that could arise otherwise.

Uses Prompts to Clarify Intention

Prompts and examples keep the user from guessing when interacting with your system. Users frequently shy away from new technologies from lack of knowledge and expectations. Your design should bridge that gap. For example, sharing that “You can say ‘billing’ or ‘payment’” to the user will help limit the uncertainty of choice, and better direct the conversation to where it needs to go.

Keep it Natural

It’s easy to design for straight paths, but the natural cadence of conversation is more involved. Be aware of how people take turns when talking; pauses, time to think, and verbal fillers are to be expected and should be planned for.

Share all Possibilities

Let people know the realm of possibilities available to them. For example, when you ask Siri a question, she often shares a few possible questions you can ask to achieve your desired result. This will cause users to engage more. Users don’t know what they don’t know.

Consider integrations

Your product and organization doesn’t have to solve every problem. Instead, look for ways to connect your solution to others to generate the most value for users. There are countless integrations and extensions that exist today to streamline customer experience and make your product more powerful. For example, IFTT exists to do just that, syncing activities from multiple websites, and now Alexa and Google Home, too. Did you misplace your phone in your house? IFTT allows you to ask Alexa to call your phone.

Watch for Cognitive Load                                                     

Information overload is the downfall of many great products. Cognitive load is about how hard the user’s brain has to work to use your product. With voice-driven experiences, the user must remember more, as there is no display to help the user. There needs to be the right balance between information, options, and simplicity for the user to maximize how they interact with the product. One strategy to prevent overload is to chunk out info into bite size tidbits, so users aren’t overwhelmed  - this is especially helpful in the case of the elderly population or young children.

Rethinking User Experience

In design, the “happy path” (or the simplest, most direct engagement) is a great place to start. Once you’ve developed your voice-based system to a point where getting from A to B is a breeze, you can allow your system to wander. Even so, you need to reach a balance between offering one or two next steps and a couple dozen to simplify use. In human conversation, nonverbal cues convey lots of meaning that voice-based technology doesn’t have the option of communicating. During testing, blindfolding subjects helps with taking the considerations of these silent communication tactics out of the picture. You must also reframe how you think about user experience. Running into errors during testing provides an opportunity to expand the conversational space, rather than as a deterrent. When you think of errors as opportunities, the voice-based world becomes easier to navigate and design.

For more information on user experience, including voice-based design, virtual reality, , and more, subscribe to the Filter Digest blog or read more on our User Experience webpage.