Skip to main content

Web Speech API for a Node.js/Socket.io chat app

2 min read

Older Article

This article was published 13 years ago. Some information may be outdated or no longer applicable.

HTML5 ships with the Web Speech API, and I thought it’d be fun to bolt speech recognition onto my chat application. Instead of typing messages, people could just talk.

I’ve pushed the latest changes to the GitHub repository. Let’s walk through them.

On the HTML side, I’ve added a ‘Record’ button with a startButton(event) function wired up to the onclick event. Simple.

The interesting bits live in the JavaScript.

First thing: check whether the browser supports Web Speech. At the time of writing, Google Chrome is the only browser with full support. Mozilla’s been making progress on bringing it to Firefox. For now, this check does the job:

if ('webkitSpeechRecognition' in window) {
  console.log('webkitSpeechRecognition is  available');
}

If it’s available, we initialise it and set continuous and interimResults to true. When continuous is on, the user agent keeps recognising and returns zero or more final results (think dictation). When interimResults is on, we get partial results as the user speaks.

var recognition = new webkitSpeechRecognition();
recognition.continuous = true;
recognition.interimResults = true;

For the full list of configurable parameters and methods, run console.log(recognition); and poke around.

I’ve implemented two methods so far, onstart() and onresult():

recognition.onstart = function() {
  recognizing = true;
};

recognition.onresult = function(event) {
  console.log(event);
  var interim_transcript = '';
  for (var i = event.resultIndex; i < event.results.length; ++i) {
    if (event.results[i].isFinal) {
      final_transcript += event.results[i][0].transcript;
      $('#msg').addClass("final");
      $('#msg').removeClass("interim");
    } else {
      interim_transcript += event.results[i][0].transcript;
      $("#msg").val(interim_transcript);
      $('#msg').addClass("interim");
      $('#msg').removeClass("final");
    }
  }
  $("#msg").val(final_transcript);
  };
}

The event handler on the ‘Record’ button picks the language and fires start():

function startButton(event) {
  if (recognizing) {
    console.log('stopping');
    recognition.stop();
    recognizing = false;
    $('#start_button').prop('value', 'Record');
    return;
  }
  final_transcript = '';
  recognition.lang = 'en-GB';
  recognition.start();
  $('#start_button').prop('value', 'Recording ... Click to stop.');
  ignore_onend = false;
  $('#msg').val();
}

When you test the application, Chrome will ask for microphone access before recording starts. Allow it.

There’s still plenty to do here. I want to catch the event when someone denies microphone access (accidentally or on purpose). Adding more language options would be good too. Those improvements will come with time.