How-to: Connect Jibo to IBM's Watson for visual recognition and language sentiment

Objective

This a quick walkthrough of how to connect the Jibo SDK to the Alchemy and Visual Recognition API provided for IBM’s artificial intelligence Watson via their Bluemix service.

We’ll connect Jibo to both APIs, then take a photo and run it through the Visual Recognition API to see if we can determine what’s in the photo. We’ll also use the Alchemy API to test the language sentiment (positive or negative) of any text.

These services could be a great way for Jibo to understand his surroundings in a more intelligent way.

Prerequisites

You’ll need a working version of the Jibo SDK installed. This tutorial uses the Basic Dialogue Starter skill template as a quick starting point to speed up development.

Steps

1) If you haven’t already, create a new IBM Bluemix account. You can begin this process here. This signup has a few steps which you’ll need to walk through: signup > email confirmation > environment setup.

2) Login to your IBM Bluemix account and access the Services area. As soon as you login, you’ll be directed to the Dashboard. Click “Services & APIs” to access the Services area.

3) Create a new API key for the AlchemyAPI service. From the Services page under Watson, click AlchemyAPI. Choose the Free plan and leave the rest of the default service settings, then click the “Create” button. Once created, click the “Service Credentials” link within the AlchemyAPI service to see your new API key. Copy this for later use.

4) Create another new API key, this time for the Visual Recognition API service. From the Services page under Watson, click Visual Recognition. Choose the Free plan and leave the rest of the default service settings, then click the “Create” button. Once created, click the “Service Credentials” link within the Visual Recognition API service to see your new API key. Copy this for later use.

5) Create a new project in your Jibo SDK. Don’t forget to npm install it if needed after creation. Copy over the two files from the Basic Dialogue Starter skill template into this new skill. Make sure to install any required npm modules for the template (e.g. nedb).

6) Add the following two npm modules to your project: ‘watson-developer-cloud’ and ‘request’. To do this, just navigate your console to your projects folder and run npm install watson-developer-cloud and npm install request. The ‘watson-developer-cloud’ module allows connections to the Bluemix service API. The ‘request’ module will let us save the photo Jibo takes to our skill directory before we send it to Watson.

7) Open main.rule and make the following changes. These changes will allow Jibo to understand two new statements, “Take a photo and analyze it” and “Analyze a random phrase”. Just replace the code within the file with this:

main.rule

# Allow Jibo to understand the following phrases and combinations of...
#
# "Take a photo and analyze it"
# "Analyze a random phrase"
# "Say goodbye"
#
# Returns:
# "NLParse": {
#   "action": "playSound", // e.g. analyzePhoto, endDialogue
# },
#
TopRule = $* (
    # Add your rules and responses here
    ( ($take a $photo and $analyze){action='analyzePhoto'} ) |
    ( ($analyze a random phrase){action='analyzePhrase'} ) |
    ( (say $goodbye){action='endDialogue'} )
) $*;

# Simple words and phrases
take = (take | snap);
analyze = (analyze | study | review);
photo = (photo | picture | pic);
goodbye = (good bye) | goodbye | bye;

8) Add in the ‘analyzePhoto’ capability. Here we’ll add a TakePhoto behavior with some custom code which will take a photo, send it to Watson for analysis, and output the results to the console. On line 10 in your main.bt file, edit the Case decorator conditional by replacing “playSound” with “analyzePhoto”. Remove the child Switch behavior completely on line 11 by right-clicking and choosing Delete. Right-click the Sequence on line 10 and choose Add Child, then choose the TakePhoto behavior. Your current setup should look like this:

Replace the onPhoto argument in the TakePhoto behavior with the following code. Be sure to change the [API_KEY] in the code above to the Visual Recognition API key you recorded in Step 4:

TakePhoto > onPhoto

(error, imageUrl) => {
  var watson = require('watson-developer-cloud');
  var fs = require('fs');
  var request = require('request');
  
  // Create new temporary local image for this photo
  var newImage = 'temp.jpg';
  var stream = request(imageUrl).pipe(fs.createWriteStream(newImage));
  
  // When image has saved, send it to Watson and output the results to the console
  stream.on('finish', function () {
    console.log("newImage:",newImage);
    
    var visual_recognition = watson.visual_recognition({
      api_key: '[API_KEY]',
      version: 'v3',
      version_date: '2016-05-19'
    });

    if(newImage){
      var params = {
        images_file: fs.createReadStream(newImage)
      };

      visual_recognition.classify(params, function(err, res) {
        if (err){
          console.log(err);
        } else {
          console.log(JSON.stringify(res, null, 2));
        }
        fs.unlink(newImage); // delete temp image
      });
    }
  });
}

9) Add in the ‘analyzePhrase’ capability. Here we’ll have Jibo choose a random phrase, send it to Watson for language sentiment analysis, and output the results to the console. On line 13 in your main.bt file, edit the Case decorator conditional by replacing “sayHello” with “analyzePhrase”. Remove the child TextToSpeechJS behavior completely on line 15 by right-clicking and choosing Delete. Right-click and Swap the PlayAnimation behavior on line 14 with an ExecuteScriptAsync. Your current setup should look like this:

Replace the exec argument in the ExecuteScriptAsync behavior with the following code. Be sure to change the [API_KEY] in the code above to the Visual Recognition API key you recorded in Step 3:

ExecuteScriptAsync > exec

(succeed, fail) => {
  var watson = require('watson-developer-cloud');

  var alchemy_language = watson.alchemy_language({
    api_key: '[API_KEY]'
  });

  // Select a random phrase
  var nPhraseARRAY = [
    "IBM Watson won the Jeopardy television show hosted by Alex Trebek",
    "IBM Watson is a cognitive system enabling a new partnership between people and computers",
    "IBM Watson can sometimes makes bad assumptions or errors with certain text"
  ];
  var nRandPhrase = nPhraseARRAY[Math.floor(Math.random()*nPhraseARRAY.length)];
  console.log("Phrase to analyze: ",nRandPhrase);
  var params = {
    text: nRandPhrase
  };
  
  // Send phrase to Watson and output the results to the console
  alchemy_language.sentiment(params, function (err, response) {
    if (err){
      console.log('error:', err);
    }else{
      console.log(JSON.stringify(response, null, 2));
    }
    succeed();
  });
}

10.) Use your brand new creation. You can test it out with two commands:

(a) To use the visual recognition aspect of your new skill, run your skill in the Jibo simulator and type “hey jibo” then “take a photo and analyze it”. Since we’re in the simulator, Jibo will use an on-board photo (instead of actually taking one) and returns an analysis on that in the forms of classifiers. You’ll see in your console a JSON-encoded nested array with a class result of “crowd” and “people”, indicative of the fact that the photo is a group of people (the Jibo team).

(b) To use the language sentiment aspect of your new skill, run your skill in the Jibo simulator and type “hey jibo” then “analyze a random phrase”. Jibo will pick a phrase from our script and return an analysis. In this case, the class result will match the setiment of the random phrase chosen with a type of positive or negative, and a score indicating by how much based on the words in that phrase.

Final Notes

As you may already know, IBM Bluemix is a free service for hobbyists and startups, but if you use any of their APIs in production-level skills for Jibo, you’ll likely need to upgrade your account to pay monthly for the services you use. You can find out more about their pricing here.

5 Likes

@michael This is AMAZING! Thank you so much for sharing this!

Thanks, hopefully it will be useful to somebody down the road who needs one or more of these capabilities. There are also many other APIs available within Watson and IBM Bluemix for other query types, and this tutorial can be used as a primer for any one of them, but I thought the visual recognition part was definitely the coolest.

1 Like