Voice

The invisible Interface...or is it?

when i think of a seamless conversation with a machine, I have always thought about the movie "her". How the "actor" sets up his new "omni" operation system on his devices. The machine conversation is very human like.

Though a figment of imagination that i saw, i believe it can happen the same in real world.

Right now we have,

System created to serve the user with it's own

personality
Identity
Behaviour

User with their

painpoints
Goals
Behaviours

So designing voice based solutions means designing for the system and the user (not just for the user......cough)

So how do we start designing for.....................voice? does it look a little too daunting?

By creating a sample project, i came to a step by step process and also how to imagine building a solution for voice.

We start with the "happy ending".... and yes...we are starting a play rehearsal too.

Start by writing the best successful scenario according to you and act it out. When we start saying the dialogues we wrote, we can measure our result by asking ourselves, " is this how we speak normally?"

A "hello world" of voice design

One of the most simple chores i would ask a machine to do is to set an alarm. To design for it is to write scripts for the "setting an alarm" play. And finally to go over it once more, We can look at it like

The intent

objective of the user’s interaction with a voice-enabled system.

Example : Set an alarm

Utterance

How users say the request

Example : Wake me up at 7 am

Variable

Additional information in a request

Example : am/pm

Error states

Add them for every scenario

Example : I’m sorry. Could you repeat that?

For the sake of simplicity, i'm giving my system a name; baymax

User : Hey baymax ( utterence for activating the voice system ), set an alarm

Baymax : Sure, what time?

User : mmmm... 7am every weekday ( remember to keep the happy ending first, it's easy to get carried away with all the sad scenarios where system fails)

Baymax : Ok ( providing confirmation) . I have Set an alarm for every weekday.

Thus the user set an alram and lived happily ever after..

Now , we can go deeper into the story

Setting an alram

Now that we have the basics covered, i must tell you that living in a diverse world comes with it's own challenges. People can be crafty at asking questions

Known Query

Example : play "Paradise by coldplay”

These are specific queries where machine can fetch the request easily in a single interaction

Thematic Query

Example : Play “paradise”

Here, the system has to do some guesswork or look into users data to understand "what is melody songs" in general or considered by the user.

Open ended

Example : Play“Something new”

The whimsical and also the most challenging for the system. Here the system will have to guess or churn some more questions to understand users intent.

Even after a specific query from the user, the system will face challenges because there are so many answers to the question.

A singer can sing multiple versions of the same song ( lyrical, reprise, explicit and what not), a specific song request can be exhausting.

Then there is this ... the nuances, dialets, so many mix and matches of words that just context can change the meaning . So make sure to listen to everything around you and keep a note. as language evolve, so does the complexity of creating a voice experience.

If the user say " the dress is so sick" , it is the responsibility of the system to understand that user might like to wish list it, rather than find it spreading a contagious disease.

So with that, let's get to making an voice experience.