There’s a new paradigm coming that will change the way we use computers, and it’s as old as humanity itself. It’s the human voice. Instead of typing or tapping onto a screen, people will speak to a conversational agent, or to put it simply, a bot. And these bots have already begun introducing themselves to us.
Far beyond just saying “hello,” they’re driving a sea change in computing, variously called conversational commerce or voice-enabled computing. Connected to powerful artificial intelligence (AI) systems, mediated by conversational user interfaces (CUIs) and visualized by everything from avatars to a glowing ring of light, the current generation of bots is empowered to act on our behalf as assistants and advisors. CUIs represent the next step in computing’s evolutionary chain because within their DNA is the most natural means of interaction we’ve ever had—conversation.
According to a 2014 Fast Company article, computer scientist Andrew Ng predicted that “at least 50 percent of all searches” would be done by voice or images in five years. As commerce shifts toward using CUIs, there’s enormous promise for brands that recognize the potential and get in now. “The analogy we use at Seed Vault,” says Nathan Shedroff, chief executive officer of the Singapore-based company, “is: ‘This is the web circa 1996, and everything you know about commerce is about to change.’ We are at the same inflection point with conversational interfaces.”
Most brands don’t have the resources to build this technology internally. The problem with this, as Shedroff sees it, is that if a brand like BMW or Bang & Olufsen wants to incorporate a conversational interface, it has nowhere to go besides Amazon, Google, Microsoft, Samsung or Apple, whose virtual assistants—Alexa, Google Assistant, Cortana, Bixby and Siri, respectively—have become a part of everyday chatter. “You can buy a Bang & Olufsen speaker system and ask it to turn up the volume,” Shedroff says, “but first you have to say, ‘Hey, Alexa,’ or ‘OK Google.’ So, where does that leave Bang & Olufsen? Brands will wake up and realize they are about to lose their connection to their customers, and with it, their brand value.”
That’s where Seed Vault comes in. Shedroff and his team started Seed Vault as an independent, open-source bot marketplace and an alternative to hegemonic control of conversational commerce. Developers can code a conversational interface from any component they find in the Seed Vault bot store. “There will be off-the-shelf bots you can license, make a modification to and start using immediately,” Shedroff explains of what Seed Vault will eventually offer. “And there will be more bot components and services, such as dialog scripts for a customer service bot, translations for health-care bots and 3-D avatars that can be licensed for any conversational chatbot.”
For brands that are just beginning to explore conversational commerce—which, these days, is most brands—Charles Cadbury, founder of London-based SayItNow, explains that there’s a three-step plan for deepening user engagement and adoption of CUIs. “First, build trust using simple, mundane services like news. Over time, end users develop enough confidence to do more complicated transactions, such as making a booking or a purchase.” Next comes personalization through interactions. “Every interaction enriches the [AI’s] view of the individual over time and enables brands to create channels of one-to-one communication across SMS and Facebook Messenger,” he says. Eventually, we get to activation. “In this phase, an AI can craft conversations that lead to action,” Cadbury says. For example, after booking you a train trip, a bot could recommend renting a car at the station.
To help brands reach this conversational promised land, Cadbury and Sander Siezen, vice president of product development at SayItNow, offer half-day conversational bot design workshops. As Siezen explains, “We work with clients to find out their goal, construct a persona of the end user, then figure out what they will say, and how the system might guide them along the ‘golden path’ of where we want to get to.” The result is a conversational flowchart that proceeds from user story to intent. “Our goal is to create a conversational map with many routes,” Siezen says. “From an AI perspective, it’s a maze. And one of the things AIs do best is navigating mazes.”
Angie Terrell, director of design at Big Nerd Ranch, an Atlanta-based app development and training company that helped Amazon develop its Alexa Skills Kit, suggests that designers who are creating voice-driven capabilities for Alexa first “understand the constraints and guidelines of the platform. It’s like going to a new city. You know how generally to navigate, but how does this city do it?”
When it comes to CUIs, Terrell explains that “Alexa is looking for an intent”—that is, what the user wants to accomplish. Say, for example, someone wants to book a flight using Alexa. A designer must design Alexa to respond to a range of intents, from no intent (for example, “Hey Alexa... ”) to partial intent (“Hey Alexa, book me a flight”) to full intent, where the system can do exactly what the user asks (“Hey Alexa, book me a flight to London on Virgin this Saturday”).
To ensure Alexa can accomplish its goals, Terrell says that CUI designers “have to design a script based on the intents and build to it. It’s like preparing your design for the happy path and the error path. That involves all the things we do as UX designers, but you are doing that with language.”
Terrell advises that designers be as specific as possible with the options they provide the user. Instead of asking, “Would you like fries and salad with that?” she counsels, “The system should ask, ‘Which side would you like, French fries or a salad?’ Designers have to write scripts for Alexa that make the choices distinct. It’s subtle, but critical to the success of the experience,” she says. “Then user-test the hell out of it.”
Designers proficient in everything from information architecture to linguistics to UX testing will help fuel conversational commerce’s long haul. Traditional skills such as storyboarding, character illustration and 3-D design will also come in handy as avatars are designed to serve as the front end of bots. And according to Cadbury at SayItNow, there’s a growing demand for playwrights, poets and screenwriters to script the conversational pathways between users and their bots. A quick scan of Apple’s job postings for Siri developers includes job titles for “memories engineer,” “domains understanding” and “text-to-speech scientist.”
“Bots are archetypal and cultural,” says Mark Stephen Meadows, chief executive officer of San Francisco Bay Area–based Botanic Technologies. A financial assistant for adults will look different than a bot for a kid managing her asthma. That’s why, after the golden path from user intent to the bot’s completion of the task has been efficiently mapped out, Meadows says that Botanic “builds a design spec that determines personality, appearance, and what the bot says and how it says it.” To make certain the bot looks like how it acts and sounds, Botanic builds a personality matrix, drafts a bio, writes a backstory for the bot and then puts out a casting call for an actor to provide the voice for the bot.
When it comes to visual and character design, the process is similar to creating a character for an animated film. Character concepts are illustrations that show the bot in a series of positions, reacting to a variety of emotions from the user. “We can change skin tone, texture and movement, and tune shading and lighting design,” Meadows says. Then, using .ACTR, a 3-D presentation standard for avatars, developers at Botanic generate real-time animations from natural language generation models that connect gesture, emotion, action and speech into an on-screen representation of a personal digital assistant.
Meadows says visual design for a bot doesn’t have to be complex. “In visualizing a bot, if you have two dots on the face of a bot, you are 90 percent of the way there. If you want to establish trust, design a face for your bot. The visual appearance of a bot not only demystifies the AI; it also provides a visual component of the personality. We spend a tremendous amount of time designing the personality for a bot because no one wants to speak with a robotic robot.”
Despite the billions of dollars Apple has spent on Siri, and the thousands of hours to program it, it’s still hard to have a decent conversation with it. (Sorely lacking, too, is a general code of ethics to try to prevent such issues as breaches of user privacy and racist bots.) But, as break- throughs such as Google Duplex, a human-sounding phone bot, show, machines learn really fast. Today, single-purpose bots can understand our focused, transactional intents. It’s only a matter of time before bots understand them all.
Until then, bots will continue to listen and suggest actions from our phones, wearables, appliances and cars. Serving as the front end for powerful AI systems, bots listen, speak and recommend. And we speak right back in an interaction so seamless that marketers are actually using the word delightful to describe the relationship between us and bots. For once, they might be using the word accurately. ca
HOW BOTS WORK
A simple explanation
Ask a simple question, get a simple answer. That’s the promise of bots like Siri, Cortana, Bixby and Google Assistant. But there’s nothing simple about it. Ask “What’s the weather in San Francisco?” and behind the scenes, some extremely sophisticated technology goes to work. For the bot to understand you, it invokes an API service call to perform automated speech recognition. (API stands for application programming interface, a set of protocols used to build and interact with software.) The bot listens to what you say and finds the answer for “What’s the weather in San Francisco?” Then it uses a natural language processing API to look for intent. In other words, the bot queries an AI system, “What did the user just ask me?” Next, it connects to a weather service API and finds the data to answer that question. In this case: “Partly cloudy.” To transform the data into speech, the bot invokes a text-to-speech API service call. Finally, you hear the answer: “It’s partly cloudy in San Francisco.” All told, the bot sits between the user and at least five separate API service calls.