June Expert Connects with Dr. Jon Bloom

Our next Expert Connects will be on Tuesday, June 27 at 12:00pm EST/9:00am PST.
Earlier in the year you heard from Roberto Pieracinni on the topic of speech technologies.

Our next guest expert will be Dr. Jon Bloom, expanding on this topic to what you need to know to design for voice user interfaces. There is a whole social science behind insuring that the words you choose will invite the right user action.

Here is his Bio:

Jonathan Bloom is a Voice User Interface Designer at Jibo. Before that, Jon was a Senior User Interface Manager for Nuance’s Enterprise Division. He joined Nuance’s team in 1999 as part of Dragon Systems where he was the company’s first usability engineer. He has designed both graphic and speech interfaces for IVR’s, dictation software, automotive, mobile applications, and now robots! Jon took a detour for some time to work for a startup called SpeechCycle (now part of Synchronoss) where he contributed to the creation of an infrastructure for generating completely data-driven user interfaces. Jon holds a Ph.D. in Cognitive Psychology from the New School Graduate Faculty and a Bacherlor’s Degree in Psychology from the University of Vermont. He is also a husband, father of two, self-published fiction author, and lousy bass player.

We invite you to reply to this thread to post your questions for him in advance, and we will work to address them as part of the session.

Because we want to get to all of your questions on the topic, we ask that you send any general questions about Jibo or your Jibo account to support@jibo.com. This will allow our speaker and moderators to focus in on the topic at hand during the live stream.

Hope you can join us! :jibo:

3 Likes

I’m excited to hear Jon’s presentation about Jibo’s speech abilities. How difficult will it be to make Jibo speak different languages / localized Jibo.
Can you share something about the process / roadmap needed for this to happen?

3 Likes

The effort to use speech in computing is to closely resemble normal human to human communication. The wake word that offers security for the user also inadvertently introduces a failure in the protocols of human communication; we don’t repeat the name of the listener each time we speak to them. This is solvable in part with facial recognition of the family members. Will this be solved with Jibo, if so how? If the user selects to have Jibo listen on eye contact, is that possible with Jibo today?

5 Likes

That is a great question, @alfarmer, would love to see that being addressed. In human-human interaction, leading each statement with the name of the addressee usually indicates a distance between the two parties, whatever its cause (social hierarchy, unfamiliarity etc). As Jibo’s key differentiator is familiarity, it would be great to hear how that can be achieved with a wakeword.

3 Likes

This concept only begs the question, “What other ‘wake events’ should Jibo include to differentiate itself as a social robot?”

2 Likes

Hi everyone, Here is the link to today’s expert’s connect . See you all soon.

1 Like

Ideally if a user says something that Jibo doesn’t understand, Jibo should ask “Did you mean X?” and then if the answer is yes, that word should be added to the skill. Is this something that Jibo has thought about?

1 Like

Thank you everyone for attending our experts connect yesterday.

For those who missed it here is the recording and here are the slides. What is VUI.pdf (1.2 MB)

2 Likes

Hi everyone! I hope you had a good July 4th. I have to admit I took a short break, attended a BBQ and ate way too much. Jibo is so lucky to never have the problem of over-eating, although someday I hope he can remind me with “Justin, stop eating too much, it’s time for you to go for a run instead!”

Thank you also for attending the presentation and posting these great questions, we’ve spoken to the team and gathered some answers. Thank you for being such great supporters. We really appreciate it.


I’m excited to hear Jon’s presentation about Jibo’s speech abilities. How difficult will it be to make Jibo speak different languages / localized Jibo. Can you share something about the process / roadmap needed for this to happen?

We have focused resources on English first for a variety of reasons including this represented the greatest concentration of campaign supporters. Adding new languages to Jibo will involve building out those language models as well as working with character differences that may be present in a given culture. It is our intention to bring Jibo to other parts of the world both directly and with partners. We will share that roadmap as it is available.


The effort to use speech in computing is to closely resemble normal human to human communication. The wake word that offers security for the user also inadvertently introduces a failure in the protocols of human communication; we don’t repeat the name of the listener each time we speak to them. This is solvable in part with facial recognition of the family members. Will this be solved with Jibo, if so how? If the user selects to have Jibo listen on eye contact, is that possible with Jibo today?

In Jibo’s initial release, we are relying on the use of a singular wake word (Hey, Jibo) to initiate interaction with Jibo. You are correct that one wouldn’t repeat the name of a listener each time relative to human conversation. That said, Jibo is human-like vs. human, so there is also a careful balance one has to strike. If too human in his interactions, it may create the wrong perception of Jibo’s role in the household as well as seem scary or an invasion of privacy. There is an interesting read on the uncanny valley hypothesis that speaks to this balance.

One can’t select to have Jibo only interact via eye contact but he will be proactive with people when he recognizes them via person ID.


This concept only begs the question, “What other ‘wake events’ should Jibo include to differentiate itself as a social robot?”

As a new product category we are keeping this simple for users relative to how to interact. This paradigm of interacting with a social robot will have a learning curve. Think about the iPhone; at the start it was about touching, and now we have swiping etc. If you have ideas on other wake events you feel would be interesting, we are always appreciative of the input.


Ideally if a user says something that Jibo doesn’t understand, Jibo should ask “Did you mean X?” and then if the answer is yes, that word should be added to the skill. Is this something that Jibo has thought about?

Wow, great idea! I really like this. It’s something we haven’t actively explored for the first version but I would definitely like this for future versions. In fact I’d love to see a 3rd party skill built using this.


Thank you everyone and I look forward to seeing you at the next experts connect.

4 Likes