Advanced chat using node.js and socket.io – Episode 2

In this post I will try to do a very basic introduction to WebRTC and I will also show a small code example that will enable video calling between two clients using a custom signalling server written in socket.io. The ultimate goal is to combine the signalling server and the chat server written in the previous article to form a video call / chat solution.

WebRTC is an open project that enables web browsers with Real-Time Communications (RTC) capabilities via simple Javascript APIs and it's currently being worked on in collaboration between Google, Mozilla and Opera. (Why would Microsoft support any of this, right?). Because of this, the browser support is also limited to Chrome 23+, Firefox 22+ and Opera 12+ for the PCs and Chrome 28+, Firefox 24+ Opera Mobile 12+ for Android.

There are three major components that make up WebRTC:

  • getUserMedia
  • PeerConnection
  • DataChannels

The examples on this page are either the work of Sam Dutton or they are based on his examples - please have a look at his article and repositories for a more in-depth overview of WebRTC.

getUserMedia and PeerConnection
As the name suggests this component allows the browser to access the camera and the microphone of the user. A very basic example written in JavaScript could look similar to this (this assumes that your html page contains a <video /> element.

navigator.getUserMedia = navigator.getUserMedia || navigator.webkitGetUserMedia || navigator.mozGetUserMedia;

var constraints = {video: true};

function successCallback(localMediaStream) {
  window.stream = localMediaStream;
  var video = document.querySelector("video");
  video.src = window.URL.createObjectURL(localMediaStream);
  video.play();
}

function errorCallback(error){
  console.log("navigator.getUserMedia error: ", error);
}

navigator.getUserMedia(constraints, successCallback, errorCallback);

Open this page on localhost and after allowing your browser access to your camera, you should see yourself. Well done.

Now let's take this a bit further. We are going to create a local video and a remote video feed on the same page. Admittedly, this doesn't have much of a use but it's a good way to demo the RTCPeerConnection. Let's create the following HTML template:

<video id="localVideo" autoplay></video>
<video id="remoteVideo" autoplay></video>
<button id="callButton">Make a call</button>

The JavaScript code will slightly be more complicated than the previous example:

var localStream, localPeerConnection, remotePeerConnection;

var localVideo = document.getElementById("localVideo");
var remoteVideo = document.getElementById("remoteVideo");

var callButton = document.getElementById("callButton");

callButton.disabled = true;
callButton.onclick = call;

navigator.getUserMedia = navigator.getUserMedia ||
    navigator.webkitGetUserMedia || navigator.mozGetUserMedia;
  navigator.getUserMedia({audio:true, video:true}, gotStream, //note that we are adding both audio and video
    function(error) {
      console.log(error);
    });
//Everything above this line should be familiar from the previous example
function gotStream(stream){
  localVideo.src = URL.createObjectURL(stream);
  localStream = stream;
  callButton.disabled = false;
}

function call() {
  callButton.disabled = true;

  if (localStream.getVideoTracks().length > 0) {
    console.log('Using video device: ' + localStream.getVideoTracks()[0].label);
  }
  if (localStream.getAudioTracks().length > 0) {
    console.log('Using audio device: ' + localStream.getAudioTracks()[0].label);
  }

  var servers = null;

  localPeerConnection = new webkitRTCPeerConnection(servers);
  console.log(localPeerConnection)
  console.log("Created local peer connection object localPeerConnection");
  localPeerConnection.onicecandidate = gotLocalIceCandidate;

  remotePeerConnection = new webkitRTCPeerConnection(servers);
  console.log("Created remote peer connection object remotePeerConnection");
  remotePeerConnection.onicecandidate = gotRemoteIceCandidate;
  remotePeerConnection.onaddstream = gotRemoteStream;

  localPeerConnection.addStream(localStream);
  console.log("Added localStream to localPeerConnection");
  localPeerConnection.createOffer(gotLocalDescription);
}

function gotLocalDescription(description){
  localPeerConnection.setLocalDescription(description);
  console.log("Offer from localPeerConnection: \n" + description.sdp);
  remotePeerConnection.setRemoteDescription(description);
  remotePeerConnection.createAnswer(gotRemoteDescription);
}

function gotRemoteDescription(description){
  remotePeerConnection.setLocalDescription(description);
  console.log("Answer from remotePeerConnection: \n" + description.sdp);
  localPeerConnection.setRemoteDescription(description);
}

function gotRemoteStream(event){
  remoteVideo.src = URL.createObjectURL(event.stream);
  console.log("Received remote stream");
}

function gotLocalIceCandidate(event){
  if (event.candidate) {
    remotePeerConnection.addIceCandidate(new RTCIceCandidate(event.candidate));
    console.log("Local ICE candidate: \n" + event.candidate.candidate);
  }
}

function gotRemoteIceCandidate(event){
  if (event.candidate) {
    localPeerConnection.addIceCandidate(new RTCIceCandidate(event.candidate));
    console.log("Remote ICE candidate: \n " + event.candidate.candidate);
  }
}

So what does this code do exactly? Firstly it shares both the local and the remote descriptions in a format of SDP (Session Description Protocol) of the local media conditions. Simply put, SDP is a format describing streaming media init parameters and within an SDP message there are three main sections, each having multiple timing and media descriptions:

Session description
    v=  (protocol version number, currently only 0)
    o=  (originator and session identifier : username, id, version number, network address)
    s=  (session name : mandatory with at least one UTF-8-encoded character)
    i=* (session title or short information)
    u=* (URI of description)
    e=* (zero or more email address with optional name of contacts)
    p=* (zero or more phone number with optional name of contacts)
    c=* (connection information—not required if included in all media)
    b=* (zero or more bandwidth information lines)
    One or more Time descriptions ("t=" and "r=" lines; see below)
    z=* (time zone adjustments)
    k=* (encryption key)
    a=* (zero or more session attribute lines)
    Zero or more Media descriptions (each one starting by an "m=" line; see below)
Time description (mandatory)
    t=  (time the session is active)
    r=* (zero or more repeat times)
Media description (if present)
    m=  (media name and transport address)
    i=* (media title or information field)
    c=* (connection information — optional if included at session level)
    b=* (zero or more bandwidth information lines)
    k=* (encryption key)
    a=* (zero or more media attribute lines — overriding the Session attribute lines)

If you examine your browser log (either via the Chrome Dev Tools or Firebug) after making the call (i.e. pressing the 'call' button), you should the SDP messages:

Offer from localPeerConnection: 
v=0
o=- 5053101937256588725 2 IN IP4 127.0.0.1
s=-
t=0 0
a=group:BUNDLE audio video
a=msid-semantic: WMS ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBA
m=audio 1 RTP/SAVPF 111 103 104 0 8 107 106 105 13 126
c=IN IP4 0.0.0.0
a=rtcp:1 IN IP4 0.0.0.0
a=ice-ufrag:a5/AbngUmbkkAspQ
a=ice-pwd:tYWWF1vcjr062ZXPZQ4eaeVN
a=ice-options:google-ice
a=extmap:1 urn:ietf:params:rtp-hdrext:ssrc-audio-level
a=sendrecv
a=mid:audio
a=rtcp-mux
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:qC+pZ4UVB+ySS4pBLbynScoRJZ084pzQV0VJsvtm
a=rtpmap:111 opus/48000/2
a=fmtp:111 minptime=10
a=rtpmap:103 ISAC/16000
a=rtpmap:104 ISAC/32000
a=rtpmap:0 PCMU/8000
a=rtpmap:8 PCMA/8000
a=rtpmap:107 CN/48000
a=rtpmap:106 CN/32000
a=rtpmap:105 CN/16000
a=rtpmap:13 CN/8000
a=rtpmap:126 telephone-event/8000
a=maxptime:60
a=ssrc:838296445 cname:Kk/fTwoZrkPHQips
a=ssrc:838296445 msid:ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBA ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBAa0
a=ssrc:838296445 mslabel:ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBA
a=ssrc:838296445 label:ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBAa0
m=video 1 RTP/SAVPF 100 116 117
c=IN IP4 0.0.0.0
a=rtcp:1 IN IP4 0.0.0.0
a=ice-ufrag:a5/AbngUmbkkAspQ
a=ice-pwd:tYWWF1vcjr062ZXPZQ4eaeVN
a=ice-options:google-ice
a=extmap:2 urn:ietf:params:rtp-hdrext:toffset
a=sendrecv
a=mid:video
a=rtcp-mux
a=crypto:1 AES_CM_128_HMAC_SHA1_80 inline:qC+pZ4UVB+ySS4pBLbynScoRJZ084pzQV0VJsvtm
a=rtpmap:100 VP8/90000
a=rtcp-fb:100 ccm fir
a=rtcp-fb:100 nack 
a=rtcp-fb:100 goog-remb 
a=rtpmap:116 red/90000
a=rtpmap:117 ulpfec/90000
a=ssrc:2160303907 cname:Kk/fTwoZrkPHQips
a=ssrc:2160303907 msid:ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBA ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBAv0
a=ssrc:2160303907 mslabel:ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBA
a=ssrc:2160303907 label:ZnmeTGwKdfmIroYzED3XNP5j57oqkZ1OfvBAv0

If you'd like to investigate further what is going on behind the scenes, chrome comes with a WebRTC Internals page, to access it point your browser to chrome://webrtc-internals where you can access statistical information about the PeerConnection. I found a few of them interesting, such as googTransmitBitrate and audioInputLevel (with this one, try to make some noises and see how the value goes up -- a true time waster to sit in front of your computer whistling the main theme of Bridge on the River Kwai.) Further to this you can also have access to various charts that can help you monitor the performance of your video call. I haven't looked into this deeper.

So I discussed SDP, the other bit that is important in the code above is the Interactive Connectivity Establishment (aka ICE). There are two ways in WebRTC to establish connections across various types of networks, STUN and ICE. To understand more about these technologies please check out this excellent Google I/O presentation on the topic.

Once all the connections are established, the remote video will receive the stream from the local video object -- this is enabled by the gotRemoteStream() function.

Now comes probably the most complicated bit. Setting up a signalling server that will emit messages between connected clients therefore allowing two separate instances to share the stream. In this post I'm going to explain how a standalone signalling server would work and in the next post, I will try my best to explain how I have merged the chat server functionality with the previously mentioned signalling server's functionality to have one, enclosed solution.

The signalling server has to make sure that it emits the right message to people who have joined the same room. If you recall, in episode 1 of the article series I have discussed how to create rooms and how to join connected people to them. That piece of code will have to be extended and some major additions will also have to be made to the client side. To give you an idea, please check out this repository. I will use this as the base of my modifications and I will explain all this in the upcoming post. Essentially I'd like to extend my chat app in a way that it doesn't only allow you to send chat messages but it also allows you to make a video call - but only to people who have joined the same room and people will have to be allowed to drop out from a call. This task won't be easy and it will probably take me some time but please bear with me.

Incidentally WebRTC also comes with a features that can be used to send chat messages only - at this time I do not plan on implementing this, I will stick to the current socket.io/node.js implementation.

DataConnection
I talked a lot about getUserMedia and the RTCPeerConnectin so let's change the topic to the DataConnection. This post focuses on the video chat element, however for the sake of completeness I also need to talk a bit about DataConnection. The chat app that I introduced in the previous article can (could?) easily be rewritten using the RTCDataConnection. The functionality is very similar to setting up a video call, pretty much the same methodology is applied, ICE and SDP. For a very good example please see the RTCDataChannel in action with the source code.

I really hope that WebRTC development will fasten up as I believe it has a great future - and it will allow developers to create great Peer to Peer applications using the web browser only - the apps out there are truly amazing, such as this file sharing app only based on HTML5 and WebRTC. Wouldn't it be great to get rid of all communication tools such as Skype, Google Hangouts and have one, centralised communications platform (maybe internal to a company) that allows people to chat, have video conferences and share files and their screens.

Okay, this is it about WebRTC, massive thanks for all the posts written by Sam Dutton - they did help me a lot understanding WebRTC. I will be working hard on trying to transforming my chat server into a WebRTC signalling server. Stay tuned and we shall all see what happens.

Show Comments