IBM Watson Speech-to-Text sends microphone data, closing connection

Question

IBM Watson Speech-to-Text sends microphone data, closing connection

I am working on a tutorial for IBM Watson Speech-to-Text using WebSocket for realtime transcription. I am using Angular.

The first 25 lines of code are copied from the API reference . This code successfully connects and initiates a recognition request. Watson is sending me a message { "state": "listening" }

.

I wrote function onClose()

that logs when the connection is closed.

I made a button that triggers a handler $scope.startSpeechRecognition

. This is used getUserMedia()

for streaming audio from the microphone and websocket.send()

for transferring data to Watson. This does not work. Clicking this button closes the connection. I am assuming that I am sending the wrong datatype and Watson is closing the connection?

I moved websocket.send(blob);

from onOpen

to handler $scope.startSpeechRecognition

. I changed websocket.send(blob);

to websocket.send(mediaStream);

. I may be wrong: 'content-type': 'audio/l16;rate=22050'

. How do I know what baud rate is coming from the microphone?

Is there a tutorial for JavaScript? When I google the "Tutorial for the IBM Watson Speech-to-Text Tutorial" at the top, it's an 8000 line SDK . Is an SDK required or can I write a simple program to see how the service works?

Here's my controller:

'use strict';
app.controller('WatsonController', ['$scope', 'watsonToken',  function($scope, watsonToken) {
  console.log("Watson controller.");

  var token = watsonToken;
  var wsURI = "wss://stream.watsonplatform.net/speech-to-text/api/v1/recognize"
    + "?watson-token=" + token + '&model=en-US_BroadbandModel';

  var websocket = new WebSocket(wsURI); // opens connection to Watson
  websocket.onopen = function(evt) { onOpen(evt) }; // executes when a connection opens
  websocket.onclose = function(evt) { onClose(evt) }; // executes when a connection closes
  websocket.onmessage = function(evt) { onMessage(evt) }; // logs messages from Watson to the console
  websocket.onerror = function(evt) { onError(evt) }; // logs errors to the console

  function onOpen(evt) {
    var message = {
      action: 'start',
      'content-type': 'audio/flac',
      'interim_results': true,
      'max-alternatives': 3,
      keywords: ['colorado', 'tornado', 'tornadoes'],
      'keywords_threshold': 0.5
    };
    websocket.send(JSON.stringify(message));

    // Prepare and send the audio file.
    // websocket.send(blob);

    // websocket.send(JSON.stringify({action: 'stop'}));
  }

  function onClose() {
    console.log("Connection closed.");
  };

  function onMessage(evt) {
    console.log(evt.data); // log the message to the console
  }

  $scope.startSpeechRecognition = () => {
    console.log("Starting speech recognition.");
    var constraints = { audio: true, video: false };
    navigator.mediaDevices.getUserMedia(constraints)
    .then(function(mediaStream) {
      console.log("Streaming audio.");
      websocket.send(mediaStream);
    })
    .catch(function(err) { console.log(err.name + ": " + err.message); }); // log errors
  };

  $scope.stopSpeechRecognition = () => { // handler for button
    console.log("Stopping speech recognition.");
    websocket.send(JSON.stringify({action: 'stop'}));
  };

  $scope.closeWatsonSpeechToText = () => { // handler for button
    console.log("Closing connection to Watson.");
    websocket.close(); // closes connection to Watson?
  };

}]);

And here is my template:

<div class="row">
  <div class="col-sm-2 col-md-2 col-lg-2">
    <p>Watson test.</p>
  </div>
</div>

<div class="row">
  <div class="col-sm-2 col-md-2 col-lg-2">
    <button type="button" class="btn btn-primary" ng-click="startSpeechRecognition()">Start</button>
  </div>

  <div class="col-sm-2 col-md-2 col-lg-2">
    <button type="button" class="btn btn-warning" ng-click="stopSpeechRecognition()">Stop</button>
  </div>

  <div class="col-sm-2 col-md-2 col-lg-2">
    <button type="button" class="btn btn-danger" ng-click="closeWatsonSpeechToText()">Close</button>
  </div>
</div>

+3

javascript speech-to-text getusermedia ibm-watson-cognitive

Thomas david kehoe 08 Aug 17 at 17:16

source to share

1 answer

Nathan Friedly · Accepted Answer · 2017-08-08T18:23:53+0000

No SDK required, but as Geman Attanasio said, it makes your life so much easier.

In your code, however, this line will definitely not work:

websocket.send(mediaStream);

The mediaStream object from getUserMedia()

cannot be sent directly over the WebsSocket - WebSockets only accept text and binary data ( blob

in the original example). You have to extract the audio and then send just that.

But even that is not enough, because the WebAudio API provides audio in 32-bit floats, which is not a format that the Watson API natively understands. SDK will automatically extract and convert it to audio/l16;rate=16000

(16-bit ints).

How do I know what baud rate is coming from the microphone?

It is available on the AudioContext and if you add a scriptProcessorNode it can be passed to AudioBuffers that include audio data and sample rate. Multiply the sample rate by the size of each sample (32 bits before conversion to 16, 16 bits after) by the number of channels (usually 1) to get the bit rate.

BUT note that the number you put in the content type below rate=

is the sample rate, not the bit rate. So you can just copy it from AudioContext or AudioBuffer without duplicating. (If you are not dumping audio as the SDK does, then it should be set to the target sample rate, not the input rate.)

If you want to see how it all works, the entire SDK is open source:

Extract audio from mediaStream: https://github.com/saebekassebil/microphone-stream/blob/master/microphone-stream.js
Transform and fetch down: https://github.com/watson-developer-cloud/speech-javascript-sdk/blob/master/speech-to-text/webaudio-l16-stream.js
WebSocket Control: https://github.com/watson-developer-cloud/speech-javascript-sdk/blob/master/speech-to-text/recognize-stream.js

Familiarity with Node.js The Streaming Standard is helpful when reading these files.

FWIW, if you're using a bundling system like Browserify or Webpack, you can select only the parts of the SDK you need and get a much smaller file size. You can also configure it to load after the page has loaded and rendered as the SDK will not be part of your initial render.

IBM Watson Speech-to-Text sends microphone data, closing connection

More articles: