A Whole New Way to Interact: Voice User Interface

Voice User Interface Oct 27, 2021

Introduction

What is a Voice User Interface? Is it better than our typical methods of interacting with our devices? In this note, you’ll be able to grasp what is it, how it works, what’s the tech behind it and why is it future-proof.

Evolution of Interfaces and Interactions

Initially, when we started to interact with our devices, started with using a physical device to see the interaction on screen. For example, using a mouse and a keyboard to interact with the monitor or using physical keys to interact with a mobile. Then, we had a new way of interaction with touch screen phones, tablets, and laptops, where touching the element could help in interaction. Now, most of the devices support VUI communications either using a physical trigger key or the users’ voice as the trigger.

Fun Fact

The volume knob of a speaker is designed in such a way that the user has to go from ”Left to Right”, which is also related to the motion in which humans read, write and clock functions.

Unimodal

A Unimodal interface is a platform where only one of the five human senses is being used. Be it either listening, touching, seeing, speaking, etc,

Amazon Echo, Google Home, Apple HomePod, etc devices only require a human voice to function, most of these devices don’t have a screen where users can touch and interact. When the user says “Okay Google, switch on the lights”, only the voice of a person is being used for a task.

Multimodal

A multimodal Interface is a platform where more than one human sense is being used for interaction. It can be a combination of two or more senses. For example, Siri on iPhones, and Google Home Hub, where the users speak to the device as well as touch the screen to interact with it. Even the infotainment systems of our vehicles, after asking the assistant of our car to take us to a place, we need to see the screen which shows the route on the map.

What is Voice User Interface(VUI) Design?

Voice User Interface (VUI) is another way of interacting with a device where the users have to use their voice to get the job done. It is an interface for speech recognition applications. A new way to interact with smart devices to experience the future of human interaction. We have new solutions in the market which support VUI, to name a few; Amazon Alexa, Siri by Apple, Google Assistant, and Cortana by Microsoft. Omega by I.M plus.

These solutions for VUI are used by consumers daily today.

From scheduling a call to ordering even a cigarette lighter, everything can be done by just using our voice.

In the future, AI will be so smart, it will seem like we are talking to another human being, almost like a human-to-human conversation.

Why VUI is Better?

Nowadays, almost all of the world is familiar with GUI(Graphical User Interface) where we touch a screen to interact, but devices like Amazon Echo, Google Home, and HomePod by Apple have taken a new leap, where we can complete a task by just using our voice.

VUI allows us to be efficient enough and do multitask. But how?

Let’s say, John is driving a car, he wants to know the route to the nearest subway station, he can trigger a voice command by saying: "Okay Google, take me to the nearest train station". Google will help John by showing directions using google maps on his car’s infotainment system or on his phone, and by dictating the route through the assistant’s voice, google will allow John to focus on the road and help him to reach his destination safely.

The Structure Of VUI

Voice command has structured anatomy through which the AI figures out the exact and correct steps to be taken for landing at the optimum result.

  1. Wake Word: The wake word refers to the trigger word/phrase to activate the voice interface to perform a task. When our device detects its wake word, it records the next spoken request and sends a recording of the user's request to web services. For eg; “Okay Google”, “Hey Siri”, “Alexa”.
  2. Utterance: An utterance is a phrase where the device reacts to what the user phrases the request. For eg; “Play Classical Music”.
  3. Variable: Variable is the type of utterance that the user wanted the device to perform, it makes it specific for the AI to take the right measures. For eg; “Play CLASSICAL Music”.
  4. Invocation: Invocation is the platform where the action happens, whether it is a proprietary platform or a third-party platform. For eg; “Play Classical Music on Spotify".

In the End

Today, Voice User Interface is a significant part of a tech roadmap for businesses. Irrespective of the industry, businesses are realizing the benefits that VUIs bring in and are cashing in upon it. Given the complexity, designing a VUI requires know-how and experience with computer science, human psychology, and linguistics, along with cognitive learning.

Images credit: Dribbble

Great! You've successfully subscribed.
Great! Next, complete checkout for full access.
Welcome back! You've successfully signed in.
Success! Your account is fully activated, you now have access to all content.