Advanced chat using node.js and socket.io – Episode 2
Older Article
This article was published 13 years ago. Some information may be outdated or no longer applicable.
A basic introduction to WebRTC, plus a code example enabling video calling between two clients using a custom signalling server written in socket.io. The end goal: merge the signalling server with the chat server from the previous article to build a video call / chat solution.
WebRTC is an open project that gives web browsers Real-Time Communications (RTC) capabilities via simple JavaScript APIs. Google, Mozilla and Opera are collaborating on it. (Microsoft? Nowhere to be seen.) Browser support is limited to Chrome 23+, Firefox 22+ and Opera 12+ on desktop, and Chrome 28+, Firefox 24+, Opera Mobile 12+ on Android.
Three major components make up WebRTC:
- getUserMedia
- PeerConnection
- DataChannels
The examples on this page are either the work of Sam Dutton or based on his examples. Check his article and repositories for a deeper look at WebRTC.
getUserMedia and PeerConnection
This component lets the browser access the user’s camera and microphone. A basic JavaScript example (assumes an HTML page with a <video /> element):
navigator.getUserMedia =
navigator.getUserMedia ||
navigator.webkitGetUserMedia ||
navigator.mozGetUserMedia;
var constraints = { video: true };
function successCallback(localMediaStream) {
window.stream = localMediaStream;
var video = document.querySelector('video');
video.src = window.URL.createObjectURL(localMediaStream);
video.play();
}
function errorCallback(error) {
console.log('navigator.getUserMedia error: ', error);
}
navigator.getUserMedia(constraints, successCallback, errorCallback);
Open this on localhost, allow browser access to the camera, and you should see yourself. Done.
Let’s push further. We’ll create a local video and a remote video feed on the same page. Not hugely practical on its own, but it’s a good way to demonstrate RTCPeerConnection. The HTML template:
<video id="localVideo" autoplay></video>
<video id="remoteVideo" autoplay></video>
<button id="callButton">Make a call</button>
The JavaScript gets more involved:
var localStream, localPeerConnection, remotePeerConnection;
var localVideo = document.getElementById('localVideo');
var remoteVideo = document.getElementById('remoteVideo');
var callButton = document.getElementById('callButton');
callButton.disabled = true;
callButton.onclick = call;
navigator.getUserMedia =
navigator.getUserMedia ||
navigator.webkitGetUserMedia ||
navigator.mozGetUserMedia;
navigator.getUserMedia(
{ audio: true, video: true },
gotStream, //note that we are adding both audio and video
function (error) {
console.log(error);
}
);
//Everything above this line should be familiar from the previous example
function gotStream(stream) {
localVideo.src = URL.createObjectURL(stream);
localStream = stream;
callButton.disabled = false;
}
function call() {
callButton.disabled = true;
if (localStream.getVideoTracks().length > 0) {
console.log('Using video device: ' + localStream.getVideoTracks()[0].label);
}
if (localStream.getAudioTracks().length > 0) {
console.log('Using audio device: ' + localStream.getAudioTracks()[0].label);
}
var servers = null;
localPeerConnection = new webkitRTCPeerConnection(servers);
console.log(localPeerConnection);
console.log('Created local peer connection object localPeerConnection');
localPeerConnection.onicecandidate = gotLocalIceCandidate;
remotePeerConnection = new webkitRTCPeerConnection(servers);
console.log('Created remote peer connection object remotePeerConnection');
remotePeerConnection.onicecandidate = gotRemoteIceCandidate;
remotePeerConnection.onaddstream = gotRemoteStream;
localPeerConnection.addStream(localStream);
console.log('Added localStream to localPeerConnection');
localPeerConnection.createOffer(gotLocalDescription);
}
function gotLocalDescription(description) {
localPeerConnection.setLocalDescription(description);
console.log('Offer from localPeerConnection: \n' + description.sdp);
remotePeerConnection.setRemoteDescription(description);
remotePeerConnection.createAnswer(gotRemoteDescription);
}
function gotRemoteDescription(description) {
remotePeerConnection.setLocalDescription(description);
console.log('Answer from remotePeerConnection: \n' + description.sdp);
localPeerConnection.setRemoteDescription(description);
}
function gotRemoteStream(event) {
remoteVideo.src = URL.createObjectURL(event.stream);
console.log('Received remote stream');
}
function gotLocalIceCandidate(event) {
if (event.candidate) {
remotePeerConnection.addIceCandidate(new RTCIceCandidate(event.candidate));
console.log('Local ICE candidate: \n' + event.candidate.candidate);
}
}
function gotRemoteIceCandidate(event) {
if (event.candidate) {
localPeerConnection.addIceCandidate(new RTCIceCandidate(event.candidate));
console.log('Remote ICE candidate: \n ' + event.candidate.candidate);
}
}
What does this code do? It shares both local and remote descriptions in SDP (Session Description Protocol) format, describing local media conditions. SDP is a format for describing streaming media initialisation parameters. An SDP message contains three main sections, each with multiple timing and media descriptions:
**Session description**
v= (protocol version number, currently only 0)
o= (originator and session identifier : username, id, version number, network address)
s= (session name : mandatory with at least one UTF-8-encoded character)
i=* (session title or short information)
u=* (URI of description)
e=* (zero or more email address with optional name of contacts)
p=* (zero or more phone number with optional name of contacts)
c=* (connection information—not required if included in all media)
b=* (zero or more bandwidth information lines)
One or more Time descriptions ("t=" and "r=" lines; see below)
z=* (time zone adjustments)
k=* (encryption key)
a=* (zero or more session attribute lines)
Zero or more Media descriptions (each one starting by an "m=" line; see below)
**Time description (mandatory)**
t= (time the session is active)
r=* (zero or more repeat times)
**Media description (if present)**
m= (media name and transport address)
i=* (media title or information field)
c=* (connection information — optional if included at session level)
b=* (zero or more bandwidth information lines)
k=* (encryption key)
a=* (zero or more media attribute lines — overriding the Session attribute lines)
Check your browser log (Chrome DevTools or Firebug) after pressing the ‘call’ button. You should see the SDP messages:
Offer from localPeerConnection:
v=0
o=- 5053101937256588725 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio video
a=msid-semantic: WMS ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBA
m=audio 1 RTP/SAVPF 111 103 104 0 8 107 106 105 13 126
c=IN IP4 0.0.0.0
a=rtcp:1 IN IP4 0.0.0.0
a=ice-ufrag:a5/AbngUmbkkAspQ
a=ice-pwd:tYWWF1vcjr062ZXPZQ4eaeVN
a=ice-options:google-ice
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=sendrecv
a=mid:audio
a=rtcp-mux
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:qC+pZ4UVB+ySS4pBLbynScoRJZ084pzQV0VJsvtm
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:107 CN/48000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:126 telephone-event/8000
a=maxptime:60
a=ssrc:838296445 cname:Kk/fTwoZrkPHQips
a=ssrc:838296445 msid:ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBA ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBAa0
a=ssrc:838296445 mslabel:ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBA
a=ssrc:838296445 label:ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBAa0
m=video 1 RTP/SAVPF 100 116 117
c=IN IP4 0.0.0.0
a=rtcp:1 IN IP4 0.0.0.0
a=ice-ufrag:a5/AbngUmbkkAspQ
a=ice-pwd:tYWWF1vcjr062ZXPZQ4eaeVN
a=ice-options:google-ice
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset
a=sendrecv
a=mid:video
a=rtcp-mux
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:qC+pZ4UVB+ySS4pBLbynScoRJZ084pzQV0VJsvtm
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack
a=rtcp-fb:100 goog-remb
a=rtpmap:116 red/90000
a=rtpmap:117 ulpfec/90000
a=ssrc:2160303907 cname:Kk/fTwoZrkPHQips
a=ssrc:2160303907 msid:ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBA ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBAv0
a=ssrc:2160303907 mslabel:ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBA
a=ssrc:2160303907 label:ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBAv0
If you want to dig deeper, Chrome has a WebRTC Internals page at chrome://webrtc-internals. It exposes stats about the PeerConnection. Some interesting ones: googTransmitBitrate and audioInputLevel (try making noises and watch the value spike; a genuine time sink, sitting in front of your computer whistling the main theme of Bridge on the River Kwai). There are also charts for monitoring video call performance.
So we’ve covered SDP. The other critical piece in the code above is Interactive Connectivity Establishment (ICE). WebRTC uses two mechanisms to establish connections across different network types: STUN and ICE. For a deeper understanding, check this Google I/O presentation.
Once all connections are established, the remote video receives the stream from the local video object via the gotRemoteStream() function.
Now the tricky part. Setting up a signalling server that emits messages between connected clients, allowing two separate instances to share a stream. In this post I’ll explain how a standalone signalling server works. In the next post, I’ll cover how to merge the chat server functionality with the signalling server into one enclosed solution.
The signalling server needs to emit the right messages to people who’ve joined the same room. If you recall from Episode 1, I covered room creation and joining. That code will need extending, and there’ll be major additions on the client side. For reference, check out this repository. I’ll use it as the base for my modifications and explain it all in the next post. The goal: extend the chat app so it handles video calls (not just text messages) between people in the same room, with the ability to drop out of a call. Not a trivial task, and it’ll take some time.
WebRTC also has a feature for sending chat messages only. I don’t plan on implementing that; I’ll stick with the socket.io/node.js setup.
DataConnection
We’ve covered getUserMedia and RTCPeerConnection, so let’s touch on DataConnection. The chat app from the previous article could be rewritten using RTCDataConnection. The setup is similar to video calling: the same ICE and SDP methodology applies. For a solid example, see RTCDataChannel in action with the source code.
WebRTC has a strong future. It’ll let developers build peer-to-peer applications using only the web browser. There are already impressive apps out there, like this file sharing app built purely on HTML5 and WebRTC. Having one centralised communications platform (perhaps internal to a company) that handles chat, video conferencing, file sharing and screen sharing would be a powerful thing.
That’s it on WebRTC for now. Massive thanks to Sam Dutton, whose posts were invaluable for understanding WebRTC. I’ll be working on merging the chat server into a WebRTC signalling server. Stay tuned.