Voice
The invisible Interface...or is it?
when i think of a seamless conversation with a machine, I have always thought about the movie "her". How the "actor" sets up his new "omni" operation system on his devices. The machine conversation is very human like.
Though a figment of imagination that i saw, i believe it can happen the same in real world.
Right now we have,
System created to serve the user with it's own
personality
Identity
Behaviour
User with their
painpoints
Goals
Behaviours
So designing voice based solutions means designing for the system and the user (not just for the user......cough)
So how do we start designing for.....................voice? does it look a little too daunting?
By creating a sample project, i came to a step by step process and also how to imagine building a solution for voice.
We start with the "happy ending".... and yes...we are starting a play rehearsal too.
Start by writing the best successful scenario according to you and act it out. When we start saying the dialogues we wrote, we can measure our result by asking ourselves, " is this how we speak normally?"
A "hello world" of voice design
One of the most simple chores i would ask a machine to do is to set an alarm. To design for it is to write scripts for the "setting an alarm" play. And finally to go over it once more, We can look at it like
The intent
objective of the user’s interaction with a voice-enabled system.
Example : Set an alarm
Utterance
How users say the request
Example : Wake me up at 7 am
Variable
Additional information in a request
Example : am/pm
Error states
Add them for every scenario
Example : I’m sorry. Could you repeat that?
For the sake of simplicity, i'm giving my system a name; baymax
User : Hey baymax ( utterence for activating the voice system ), set an alarm
Baymax : Sure, what time?
User : mmmm... 7am every weekday ( remember to keep the happy ending first, it's easy to get carried away with all the sad scenarios where system fails)
Baymax : Ok ( providing confirmation) . I have Set an alarm for every weekday.
Thus the user set an alram and lived happily ever after..
Now , we can go deeper into the story
Now that we have the basics covered, i must tell you that living in a diverse world comes with it's own challenges. People can be crafty at asking questions
Known Query
Example : play "Paradise by coldplay”
These are specific queries where machine can fetch the request easily in a single interaction
Thematic Query
Example : Play “paradise”
Here, the system has to do some guesswork or look into users data to understand "what is melody songs" in general or considered by the user.
Open ended
Example : Play“Something new”
The whimsical and also the most challenging for the system. Here the system will have to guess or churn some more questions to understand users intent.
Even after a specific query from the user, the system will face challenges because there are so many answers to the question.
A singer can sing multiple versions of the same song ( lyrical, reprise, explicit and what not), a specific song request can be exhausting.
Then there is this ... the nuances, dialets, so many mix and matches of words that just context can change the meaning . So make sure to listen to everything around you and keep a note. as language evolve, so does the complexity of creating a voice experience.
If the user say " the dress is so sick" , it is the responsibility of the system to understand that user might like to wish list it, rather than find it spreading a contagious disease.
So with that, let's get to making an voice experience.