Web Speech API for a Node.js/Socket.io chat app

This post is 4 years old. (Or older!) Code samples may not work, screenshots may be missing and links could be broken. Although some of the content may be relevant please take it with a pinch of salt.

HTML5 comes with a lot of great features and one of them is the Web Speech API. I thought it would be a great idea to add speech recognition to my chat application so that people could actually "dictate" their messages as opposed to typing them in.

I have updated the relevant GitHub repository with the latest changes. Let's go through these now.

From the HTML side of things, the only thing I have done was to add a new 'Record' button to the interface, which has a startButton(event) function attached to it for the onclick event.

Now, the more interesting stuff happens at the JavaScript side.

The very first thing that I'm checking is to see whether the Web Speech is enabled by the browser. At the time of writing this article, unfortunately, Google Chrome is the only one that has full support for this feature. There are some promising news from Mozilla that they are working on getting this feature into Firefox as well. The following few lines are to be used for the time being to check whether Web Speech is enabled or not in Chrome (I suspect this statement will have to be extended once other browsers add their implementations):

if ('webkitSpeechRecognition' in window) {
  console.log('webkitSpeechRecognition is  available');
}

If it's available, we can initialise it and set the continuous and interimResults to be true. When the continuous attribute is set to true, the user agent must return zero or more final results representing multiple consecutive recognitions in response to starting recognition, for example a dictation. Further to this, when the interimResult is set to true, interim results should be returned.

var recognition = new webkitSpeechRecognition();
recognition.continuous = true;
recognition.interimResults = true;

If you're interested in further parameters that can be set - and all the methods that can be added - I strongly recommend you checking out the content of console.log(recognition);.

The only two methods I have implemented so far are onstart() and onresult():

recognition.onstart = function() {
  recognizing = true;
};

recognition.onresult = function(event) {
  console.log(event);
  var interim_transcript = '';
  for (var i = event.resultIndex; i < event.results.length; ++i) {
    if (event.results[i].isFinal) {
      final_transcript += event.results[i][0].transcript;
      $('#msg').addClass("final");
      $('#msg').removeClass("interim");
    } else {
      interim_transcript += event.results[i][0].transcript;
      $("#msg").val(interim_transcript);
      $('#msg').addClass("interim");
      $('#msg').removeClass("final");
    }
  }
  $("#msg").val(final_transcript);
  };
}

Finally the event attached to the 'Record' button is responsible for selecting the language and calling the start() function:

function startButton(event) {
  if (recognizing) {
    console.log('stopping');
    recognition.stop();
    recognizing = false;
    $('#start_button').prop('value', 'Record');
    return;
  }
  final_transcript = '';
  recognition.lang = 'en-GB';
  recognition.start();
  $('#start_button').prop('value', 'Recording ... Click to stop.');
  ignore_onend = false;
  $('#msg').val();
}

If you test the application, before you can start the recording, you need to allow Chrome to have access to your microphone - a warning window should appear where you can make the right selection.

There's a lot more work that needs to be done here - especially when it comes to event handling. I would like to capture the event if someone (either by mistake or deliberately) denies access to the microphone. It would also be nice to add additional language support - but with time I will make these ammendments as well. Until that, take care.