How do we effectively teach the users what to say and why?


It becomes an instant awareness when one uses a voice activated AI assistant, that you quickly forget what to say to activate any skill. I also feel resistance to trying to speak and possibly failing, so I don’t use what I can’t remember. Also, there is a gap in understanding how to relate to what the device can do versus how to activate and relate to what the skills can do.

Apple’s Siri uses a question mark in the lower left corner whenever ‘Hey Siri’ is activated.

If the skills could then see in the knowledge base, the most used acceptable activating utterances for all skills, then skill writers could improve the activation rate and usage of their skills. If the solution to help is also a gateway to analytics for skill improvement, it would mean Jibo could learn at every level, in any skill.


I think this is one of the more fundamental questions since it gets right to the heart of what a social robot is all about, getting the user to feel comfortable with Jibo to a point where he feels real and alive, without pre-scripting and mechanical-feeling routines. The user doesn’t need a script to communicate with you…your both human and communicate on a unspoken level as well as using words.

Considering that, Jibo at some point should be able to put the whole scene together and try to understand what you say a level above what each skill allows. On a base level, he should understand what each skill, core and third-party, does and when it’s needed. That way he could suggest what he thinks you need when you say:

“I’m leaving the house for a few days. Make sure to keep an eye on things.”

…understanding that you’ll need a broad-level security skill here, even though the skill itself only reacts to things like “turn on the alarm”.

Point being, it’s more natural to look up a help area for a list of commands on the phone than with a character. It would be better to add intuition into the mix.


I agree with your assessment @michael, which opens the consideration for communication between our skills and Jibo Skills; allowing Jibo to initiate functions within our skills, without being required to use our Jibo Skill activation routines.

Also, the idea of passing along text in the middle of our Jibo Skill, (without the required ‘Hey Jibo’) becomes significant when text received is not parsable by our skill. From here Jibo would confirm that the text is or is not a request to leave the Jibo Skill. If this is the case, then the request is performed and returns after completing it so our Jibo Skill can then continue.


I agree with @michael, and I would furthermore say that in order for Jibo to be more than a scripted machine, it needs to solve these problems at a conceptual level, not with band-aids like the one suggested in the OP (it’s a good band-aid, but a band-aid nonetheless)

Jibo supposedly is “initiating” (one of the “I’s” mentioned in the blog I believe), and thus it has an opportunity, unlike say the Echo or Google Home, to educate the user itself about its capabilities. If Jibo truly is aware of the context it is placed in, it should discern conversational shortcomings and correct them actively.

A very tall order indeed, but such is the price for an actual robot that emulates sentience.