Jibo's Voice simulator in the SDK


#1

When can we expect a simulated or actual Jibo voice to be integrated into the SDK? With a storytelling skill, It becomes impossible to get an idea of what it’s going to be like for Jibo to speak. Since the SDK is designed, so you don’t need the robot to build skills, voice audio should appear in the SDK; but do we have a better idea as to when?


#2

We will continue to make sure the team is aware that there is a desire for genuine TTS output in the SDK’s simulator for consideration in a future SDK release. When we have further details about when a feature like that will be available in the SDK we will definitely inform the developer community.


#3

A follow-up question: what’s the plan for how skill developers are expected to manipulate the voice? As an example, consider these three interactions:

User: My girlfriend just broke up with me.
Jibo: Oh, I see.

User: I think I might get a promotion next week.
Jibo: Oh, I see.

User: I think Max just ripped me off.
Jibo: Oh, I see.

The text is exactly the same in all three cases, but the intonation should be vastly different in order to be personal.
Will there be a way of manipulating that?


#4

There will certainly be ways for a developer to differentiate how Jibo speaks the same utterance in different contexts.

We have APIs that allow a developer to impact the pitch, duration, and pitch bandwidth of Jibo’s voice.

Also, as mentioned in this blog post summary of the new SDK features being developed, the SDK will include Embodied Speech as a way for developers to easily add pre-defined animations and Semi-Speech Audio (SSAs) to Jibo’s Text-to-Speech responses so that “Oh, I see” could be presented differently in one context vs. another.


#5

@john.w, is there a way to test the API command speak(text [, options] [, callback]) with output?

For my skill, I’ll need to convert text using the ‘speak’ command to fine tune the results for storytelling.


#6

At the moment there is not a way to hear/test the TTS output in the SDK in a audible form (as opposed to the text form presented in the simulator).

I have absolutely made sure the team is aware that there is a desire for the TTS to be hearable as part of the SDK and simulator in the future.


#7

Therefore it is important that all developers understand that if you want to do ANYTHING with Jibo’s voice control, you MUST wait for hardware to be delivered. Is this correct @john.w ?


#8

@alfarmer you can absolutely utilize those APIs in your skill development presently but actually hearing the TTS output would require hardware or for the TTS to be implemented in the SDK.


#9

The issue with developing without hearing is that the nuances of the voice can not be predicted. When scripting TTS in any platform you need to manipulate how you do things to correct for timing, emphasis or pronunciation issues.


#10

Obviously the same with ASR. How can you design an interactive device if you don’t know how accurate or fast it will be?


#11

I just want Jibo to have an auto-tune function so he can have a career in pop music


#12