I’m in a predicament where the content publisher and owner for my storytime.exchange site’s Jibo skill have selected a TTS voice, for reading their stories, that is different than Jibo’s TTS voice by quite a bit. Since I’m using all static content, I can store audio and stream it, or directly access Amazon’s Polly’s TTS service directly by sending the cadence corrected re-punctuated text. This requirement leaves me with one major issue. What should the official Jibo rule be for this?
Currently, when my skill starts, my voice character Justin introduces himself as one of Jibo’s teleport friends. Jibo will have remote presence capabilities, where family members embody Jibo using the screen and moving Jibo’s head around remotely from an app; as seen in the first commercial. I thought this would be the best way to communicate with the users that this remote control was part of Jibo’s abilities, and thus my voice was another example of it. To make this a known and understood function, there should be some animation/sound/motion standard that occurs whenever Jibo enters this state.
If this were what was standard, I would call that animation before my skill starts, so the users expect there to be another voice; as an ordinary event for Jibo’s remote control ability, this time with Jibo’s storyteller friend, Justin.
How should we proceed in this case?