The text/based ASR/TTS simulation in the SDK is nice, but it is not enough.
In order to really understand and test user speech and Jibo response, it is necessary to get access to live speech recognition (ASR) and to hear Jibo’s output (TTS). Typed text input will not be subject to speech recognition errors that we will need to handle. And the cadence of user interaction cannot be adequately appreciated and tuned without real-time spoken interaction.
I have been able to fake it by setting up VAD (Voice Activity Detection) from npm libraries, then calling out to Nuance ASR and TTS myself. But this is slow and klunky (at least I hope it is slower than Jibo’s native calls to ASR and TTS!). And the only voices available to non-paying Nuance developer accounts are Samantha and Tom. Samantha is female, and Tom is the voice of the ubiquitous and extremely annoying Tom Glynn—nothing like Jibo.
Please make Jibo’s ASR and TTS available in the simulator.