Introducing other TTS voices played through Jibo, as Jibo's friends?


I’m in a predicament where the content publisher and owner for my site’s Jibo skill have selected a TTS voice, for reading their stories, that is different than Jibo’s TTS voice by quite a bit. Since I’m using all static content, I can store audio and stream it, or directly access Amazon’s Polly’s TTS service directly by sending the cadence corrected re-punctuated text. This requirement leaves me with one major issue. What should the official Jibo rule be for this?

Currently, when my skill starts, my voice character Justin introduces himself as one of Jibo’s teleport friends. Jibo will have remote presence capabilities, where family members embody Jibo using the screen and moving Jibo’s head around remotely from an app; as seen in the first commercial. I thought this would be the best way to communicate with the users that this remote control was part of Jibo’s abilities, and thus my voice was another example of it. To make this a known and understood function, there should be some animation/sound/motion standard that occurs whenever Jibo enters this state.

If this were what was standard, I would call that animation before my skill starts, so the users expect there to be another voice; as an ordinary event for Jibo’s remote control ability, this time with Jibo’s storyteller friend, Justin.

How should we proceed in this case?

March Expert Connects with Adam Shonkoff

We will provide further details to the developer community about the standards/requirements for 3rd party skills as soon as they are available.

For the time being, we strongly recommend referencing the “Personality” section of our Design Style guide here for guidance around this specific question. Future updates to the design guide are expected to feature more in depth detail on the use of Jibo’s character.


I followed John’s link and didn’t see any specific info on using TTS voices other than Jibo’s, but I did notice on the Speech Style Guide page here the following quote:

Jibo is a character with a specific, consistent voice, and his spoken TTS always reflects that.

To break out of character, or in this case speak with a different voice, should be as graceful and intuitive to the user as possible. Otherwise it could feel too jarring and they’ll have a bad experience, or worse, it might make Jibo feel like a different robot.

I do like the idea of the introduction of a friend, like you’re listening to the radio and you get a new temporary speaker, and it would work if done well. I do suggest you keep a specific look to that voice, maybe via the LED.

For example, locking the LED ring to a specific solid color for the duration of the skill to denote you’re in “Justin” mode. Maybe you might use a soft “dark” color (red, violet) if you expect your users to need a nightlight, e.g. bedtime story mode. Just a thought.