Building a scalable production grade WebRTC video app

Building a scalable production grade WebRTC video app

They say the customer is always right. But this is not always true in case of building a production grade WebRTC video calling app which also can scale.

Why? Because the customer many times wants to build a world class video app like zoom / google meet within a time period of 3 months with a budget of $$$$ / 1$$$$.

How do they come to such conclusions about time and money?

They came to such conclusions because they read it on the internet that WebRTC is free as well as opensource and one can build an app like using zoom /google meet using WebRTC by downloading a random open source package with WebRTC as a keyword, from Github. They concluded afterwards that all things needed for a app like gmeet are available, either in WebRTC or in the open source package. They just need to build a new UI layer and a dashboard around WebRTC to challenge zoom / gmeet!

A notion that WebRTC is free and open source, things like gmeet can easily being built using it without much effort, is slowly built in mind of the customer. This notion is creating a lot of confusion between companies like us and the prospective customers. It takes us time to make the prospective customers understand the reality about WebRTC and the effort it takes to build production grade video applications using WebRTC.

Once the initial understanding is built that building a live video application is more than simple WebRTC, another issue arises. This time the issue is about building large enough video rooms which can possibly cater to may be a million users. A million users in one Room!

Again we need put efforts to understand the customers thought process by asking the right set of questions. It turns out that the customer currently has a teaching learning application where one teacher teaches one student at a time. They started by using gmeet for free to conduct such sessions but later they realized that they need more control along with deep integration of the calling feature to their dashboard. That’s why they are currently looking for an alternative which can provide deep integration to their dashboard while being cost effective. But they anticipate that their product will have explosive growth and will reach a million students within couple of years. That’s why they want to build a large enough video conferencing application which can scale to million users when it happens, in a couple of years.

Here again we need to put efforts to make our customers understand how a WebRTC application really works. We need discuss and explain to them the various kind of WebRTC architectures like p2p, full mesh multiparty, conferencing, live streaming, Webinar etc. Though we prefer to not to use much WebRTC jargons in the discussion, some time it becomes unavoidable when we need top explain them things like SFU, MCU, ICE/STUN/TURN, Media server, Recording Server, FFMPEG, GStreamer etc. After one or two rounds of discussions, they themselves realize that their current need can be very well satisfied in a p2p call occasionally with a TURN server. After all these discussions, they understand that building a production grade scalable p2p video call, takes much less time and resources than building a scalable production grade video conferencing application. It is a also a good starting point to test a market and the product, before investing more resources in building a scalable video conferencing application. It is immaterial that they choose a P2P app or conferencing app, in a couple of rounds of discussions, they equip themselves with all the necessary knowledge to understand the reality with WebRTC. From here on-wards, it becomes a rewarding experience to help the customer achieve his /her business objective.

After going though the above mentioned situation for a couple of time, we decides to build a scalable production grade WebRTC p2p video calling application with a loosely coupled UI. This way UI can be modified according to individual needs where as the architecture, the front end functionality and the back end stays the same. Though building a simple p2p app seems easy but building a scalable, production grade p2p video calling app with certain level of bad network tolerance is not so straight forward. Why? Because one need to take care of the below mentioned things in a production grade app which are generally not present in a simple p2p app available on Github.

Audio / Video management: This feature includes all possible things a user may need while using the app like muting / un-muting the mic, switching on / off the camera, changing of existing mic / camera to a new mic / camera for rest of the call, allowing moderator controls for remote media input change( so that a teacher can mute / un-mute the mic or switch on /off the camera of student in case of a need ) etc.
Capturing images / statistics: This feature helps in capture an image from the real time video stream for a purpose like vKYC(video Know Your Customer). With this feature, an bank agent can capture the image while the ban’s customer shows his / her photo identity card during the call for bank’s verification purposes. Collecting real time call and network statistics are also important for quality control and monitoring purposes. Also real time network monitoring can raise timely alerts to users when their network quality degrades.
Auto re-connection: This is primarily important for maintaining the call even when the network fails. A network typically fails when one’s device changes the network connection while a call is going on. It happens when a users’ device switches between WiFi and mobile network modes like 3G/4G / LTE etc. The network temporarily fails when the switch happens and comes back once the switch is over. In order to auto reconnect, an application need to detect network failure, wait until the network reconnects and restart the media communication as quickly as possible after the successful re-connection. In the WebRTC terminology, restoring the media communication is called ICE restart.
Integrations: Other application integrations like whiteboards for collaboration, text chat option, file sharing etc also play an important role for some users. Either these features should be there or a provision should be there for the integration of these features in case of a need.
Recording : Recording is another important feature in any WebRTC application. Though it may not be needed in all kinds of WebRTC applications, it becomes necessary for applications like video call centers, video health etc. Recording can happen either in client side or server side. Ideally a server side recording is preferred as it allows to post process the raw recording for multiple purposes. As an example, a video recorded in WebM format, the default recording format for WebRTC, consumes 800 MB – 1000 MB of space to store an hour long video recording which is a lot. In the server side, one can use a tool called FFMPEG to compress it, watermark it and convert it to MP4 format which can reduce the size to < 100MB for the same video. Once can use client side recording and send the file to server once the recording is done as an alternative strategy if server side recording is not possible( like a P2P call).
In call Media Manipulation: There may be a need for masking some portion of the video stream while in call for either for security or convenience purposes. A widely used feature these days for such a need is called background removal, where the background of a user sharing a camera is either blurred or replaced with another image of a coffee shop / office desk / meeting room etc. There may be other such use cases as well.

All of the above mentioned features along with a robust architecture ready for scale makes a production grade application. It takes a lot of efforts and time to build such an application with an excellent team with deep understanding of WebRTC and related technologies, frontend, backend, and many other such things.

If you are a customer looking forward to build a production grade scalable video calling app, then by now you know that you need to have a rock star team with solid understanding of WebRTC and related technologies along with sufficient time and resources at your disposal to venture on such an adventure. If you don’t have a rock star team or time at your disposal, then we are here to help in any of below mentioned way.

CP2P is our scalable production grade P2P video calling app ready for production deployment as a managed service. It comes with a very minimal UI ready for retro fitting. Either you can share your UI design and we build it for you or share the fully designed UI for integration. We can integrate , deploy in your servers / our servers and manage it for you in a cost effective manner. The link to view details and check it in action is there at the bottom of this page.

CVR is our scalable production grade video conferencing / live streaming / Webinar application ready for production deployment as a managed service. It uses our in house built from scratch WebRTC load balancer CWLB to distribute and balance load in real time with a utilization efficiency of 75%. It also uses CR, an in house advanced recording engine developed from scratch to record meetings. It uses Mediasoup as it’s core media server. The link to know more about it and schedule a demo is at the bottom of this page.

If you think that you don’t want to use any of these products, but want o develop it from scratch, we as an consulting company can help you build your own product from scratch. In this case, you need more resources and time then the previously mentioned options. If you have more resources and time at your disposal, then this can be a path to trade. The only thing to make sure in this case is that you have a dependable rock star team who can work with us for building the product.

In case you don’t have a rock start team, then there is a reason to worry. But why worry when we are here. We have an instructor led online / offline, full time training program where we can convert any fullstack / frontend / backend developer with sound javascript knowledge to rock star WebRTC developer. The timelines for the training program are as below.

  • 5 – 7 days (for the WebRTC fundamentals program)
  • 10 -14 days ( for the WebRTC fundamentals and advanced WebRTC with Mediasoup program)

I hope I was able to share enough information about building a scalable production grade video conferencing app. If you still have doubts or questions, you can reach out to me either on sp@centedge.io or on hello@centedge.io.

The link to details on CP2P is here.

The link to details on CVR is here.

The link to schedule a free 30 mins discussion with one of our experts to resolve all your WebRTC related queries is here.

If you are student / developer looking forward to learn more about basics of WebRTC by yourself through working examples, here is a github repo with working examples on successful MediaStream acquisition, building a basic signalling server, and building a working P2P call app.

Demystifying a WebRTC video calling application for the beginner

Demystifying a WebRTC video calling application for the beginner

So you are a developer(frontend/backend/full-stack) curious about developing applications using WebRTC. You are searching over the internet for the last couple of days or even months to learn the basics and to build a basic WebRTC video calling app along with a basic understanding. Though there are a few Github repositories available with the code for building a very basic p2p video calling app, none of them have the details about the inner workings of the code. The code in those repositories just runs in your localhost with a command or two where you can connect 2 tabs of your browser with a video call. When you try to read through those codes, you find a bunch of API calls that are very unfamiliar and sometimes even illogical.

In order to demystify the inner workings of a basic video calling app using WebRTC, we need to follow a 3 step beginner-friendly approach which also is commonsensical. Let’s start.

The very first step of building a video calling app is to understand how to acquire the camera and/or mic of the device you want to use for calling using any of these browsers, chrome/firefox/edge/safari. It can be a desktop/laptop / mobile device as long as any one of these browsers is present. Without the camera and mic, the video call has no meaning at all. There can be a use case where you are going to use WebRTC in the p2p mode with data channels for file sharing only but we are not going to discuss this in this blog post today. The way to acquire the camera and mic from the browser is to use an API called getUserMedia. The below line of code will acquire the camera and mic from the browser.

const stream = await navigator.mediaDevices.getUserMedia({audio:true,video:true});

With the above line of code, we will be able to acquire the camera and mic with some preconditions. The above line of code won’t work if you are not running on HTTPS. If you try to use the above line of code with HTTP, it will fail.

If you have successfully acquired the camera and mic, you are ready for step 2 of building a WebRTC video calling app. In this step, you need to build a simple signaling server so that some messages can be exchanged between both the caller and callee. This step is all about building the capability to create a server-side application that will connect to both the caller and callee, and let them share some secret messages with each other whenever needed. Nodejs is the server-side framework that is going to be used as a signaling server in this example and WebSocket as the connectivity mechanism to connect both the users.

Here is the sample code.

const https=require('https');
const WebSocket=require('ws');
const WebSocketServer=WebSocket.Server;
const httpsServer=https.createServer(serverConfig,handleRequest);
httpsServer.listen(HTTPS_PORT,'0.0.0.0');
const wss=newWebSocketServer({server: httpsServer});

wss.on('connection',function(ws){
    ws.on('message',function(message){
    
    })
})

With the above lines of code, one has a basic signaling server ready to listen to messages from the caller/callee.

Now you are ready for building the real video calling application using the work we did in the last 2 steps. Here are the steps that are needed to establish the call.

  • The Caller (peer A) connects to the signaling server and waits for the callee(peer B)
  • The Callee peer B connects the signaling server and also informs peer A that he /she is available for a call
  • Peer B clicks the call button and boom! the call is connected where both peer A and peer B can see and listen to each other.

Here are the real steps that happen behind the scenes to establish the call.

  • Peer B first creates a new PeerConnection object while passing the available ICE servers as a parameter, which helps in sending and receiving the media streams.
const pc = new RTCPeerConnection({iceServers});
  • Then it acquires the local camera and mic and adds those camera and mic tracks to the PeerConnection. This will make the PeerConnection ready to send the media feeds as soon as the connection is established, i.e. when both the user agree to use a common network configuration acceptable to both)
stream.getTracks()
      .forEach((track) =>
        pc.addTrack(track, stream)
      );
  • Then it creates an offer to generate an offer SDP(session description protocol) which contains a large number of information (approx. 80 -100 lines of information) in plain text format. It contains information like network settings, available media stream(audio/video/screen share/anything else), codecs currently available to encode and decode media data packets, and many other things.
const offer = await pc
      .createOffer()
      .catch(function (error) {
        alert("Error when creating an offer", error);
      });
  • Once SDP is generated, the local description of PeerConnection is set using the offer. In simple terms, it asks the browser for the final confirmation of the validity of all the options available in SDP. Once the local description is set, the SDP aka settings can’t be changed anymore and the SDP is then sent to remote peer A to let its browser do all the things that peer B’s browser just did.
 await pc.setLocalDescription(offer);
 //send the offer to peer A using the signalling channel
  • As soon as the local description is set , it starts generating ice candidates( in simple terms, the current network configurations of peer B) and sends it to peer A to check if the network parameters are acceptable to his / her device to receive media streams.
pc.onicecandidate = function (event) {
      if (event.candidate) {
       //send the ice candidate to the other peer using the   
       //signalling channel 
      }
    };
  • Once the SDP is received by peer A’s browser sent via the signaling server, peer A first creates a PeerConnection object while passing the ICE servers as a parameter for the same purpose. As soon as the PeerConnecion is created, it uses the offer SDP provided by peer A to set its remote description. This is needed to be done to let the browser know of the other peer’s details so that the browser can create an answer SDP as an answer to the offer at a later stage.
const pc = new RTCPeerConnection({iceServers});
pc.setRemoteDescription(new RTCSessionDescription(offer));
  • It is a repeat of step 2 for peer A where it acquires his / her own media streams and adds them to the Peerconnection to be ready to send once the connection is established.
stream.getTracks()
      .forEach((track) =>
        pc.addTrack(track, stream)
      );
  • Then It creates an answer by calling the create answer API on the PeerConnection object and generates the answer SDP. Once the answer SDP is generated, the local description is set on peer A’s side to ask the browser for one final confirmation. Once confirmed, the answer is then sent to peer B via the signaling channel for peer B’s browser’s acceptance of this answer.
const answer = await pc
      .createAnswer()
      .catch(function (error) {
        alert("Error when creating an answer", error);
      });
await pc.setLocalDescription(answer);
//send the offer to peer A using the signalling channel
  • Once the answer SDP is received on user B’s side, it calls the set remote description API to ask the browser for acceptance of the other user’s SDP. Once the browser confirms, the connection for media transport is now established.
pc.setRemoteDescription(
      new RTCSessionDescription(answer)
    );
  • Step 5 is repeated by peer A’s browser for the exact same purpose. Both the browsers have knowledge of each other’s network configuration by now. After this, both of the peers agree to use one network configuration among all the possible network configuration options given by both of their browsers. The mutually selected network configuration aka ice candidate is then used for the actual media transport between both of the users.
pc.onicecandidate = function (event) {
      if (event.candidate) {
       //send the ice candidate to the other peer using the   
       //signalling channel 
      }
    };
  • Once the connection is established, each of the PeerConnection objects starts sending their respective media streams to the remote user. As soon as the media reaches the other side, an event named ontrack is triggered on the PeerConnection object to let the browser know that other peer media has already reached and is ready to be consumed. The local browser then extracts the media from its PeerConnection object and displays it in a video element.
pc.ontrack = (event) => { 
    if(event.streams && event.streams[0]){
    //The remote stream is now available at event.streams[0]. It     
    //can be attached to the srcObject of a video element to 
    //display the remote stream to the peer.
    } 
}
  • Now the call is successfully established where both peer A and peer B can communicate with each other in real-time with their respective camera and mic.

Once all the above-mentioned steps are done correctly, a WebRTC video call can be established successfully. Here is the link to the Github repo where all the above steps are created in separate folders along with working code for your reference.

Do keep in mind that this is for learning and understanding the inner workings of how a simple WebRTC p2p call works. If you want to build a production grade p2p call which you can deploy to a cloud and use it for a commercial venture, you need to check this out.

If you want to build a production grade video calling app by yourself as an extension to this project, you need to check this post learn more about all the necessary features in a production grade app.

Also keep in mind that you need a robust architecture to build a production grade app. The code in the Github repository created for this example, has been created for learning purpose only and is not fit for production usage. If you are interested in scheduling a discussion with a principal consultant at Centedge to do the right architecture for you, you can schedule a free 30 mins consultation cal using this link

How to test Co-Turn TURN server configurations?

How to test Co-Turn TURN server configurations?

A TURN server is a need for any WebRTC application when a user behind a strict firewall needs to connect to your application. Co-Turn is a very popular open-source TURN server that can be used with any WebRTC application for bypassing firewalls. It can be installed on a Linux server(on-premise / cloud ) and need to be properly configured to work with your WebRTC application. As the scope of this post, we are going to focus only on how to test your TURN server to find out if all the configurations are done properly or not as the configuration of a TURN server deserves a separate post.

The most popular tool available today to test a TURN server is known as the WebRTC trickle ICE testing tool. The link to the site is at the bottom of this post. Here is what it looks like.

ree

As shown, we can add our STUN/TURN server credentials using the add server button. Once the server is added, we can select the server and gather all possible ICE candidates using the gather candidates button. Here is what it looks like for when a generate candidates for working a TURN server.

ree

As shown, there are 3 kinds of candidates being generated from one candidate gathering request. They are

  • host candidates
  • srflx candidates
  • relay candidates

host candidates: Host candidates are those candidates which can be used to connect a WebRTC call within the same network, i.e. both the peers joining the call are connected to one network router / switch.

srflx candidates: srflx candidates are those candidates which can be used to connect a WebRTC call when both the users are not in the same network but on different networks may be even in 2 different continents connected by the Internet.

relay candidates: relay candidates are those candidates which can be used to connect a WebRTC call when both users are not in the same network and either one user is or both the users are, behind a firewall. When a user is behind a firewall, he/she is not reachable directly from the Internet. That’s why the firewall has been put in place so that a network can be isolated from the Internet and interaction with the Internet can be controlled by a network administrator in a desired manner. The primary reason is to keep the evil eyes out of the network.But this network arrangement poses a problem to a WebRTC app as it can’t reach to a user. In order to solve this, a technique is used where a TURN server is put in place to let both the users use it as a relay server to relay their media through the server rather than directly connecting with each other, in order to bypass the firewall. If this sounds a bit confusing, you can read more about firewalls and trickle ICE to gain a better understanding.

There can be 3 kinds of relay candidates. Any one of these candidate types can be chosen to establish the connection.

  • relay-udp
  • relay-tcp
  • relay-tls

relay-udp: The most common kind of relay candidates and the easiest to connect one. When the firewall is not very strict and allows udp connection to happen, then this type is chosen.

relay-tcp: This is the next best available option when the relay-udp candidate type is not possible as udp ports are blocked by the firewall. In this case the relay-tcp candidate type is used.

relay-tls: This is the next best available option as the firewall has not only blocked the udp ports but also blocked unsecured connections over tcp. Only a secured connection with a valid authentication mechanism like ssl certificates can pass through this firewall. In this case, the relay-tls candidate type is used which has ssl certificate based authentication mechanism already enabled at the TURN server end.

To test which kind of relay candidate a TURN server is using, we can use the Firefox browser with a special setting flag switched on. Here is how it looks like.

ree

As shown, there is a flag in Firefox to tell the browser to use only relay type connections for any kind of WebRTC application used in the browser. By default, it is set to false which means only relay type connection is not enforced for WebRTC calls. This flag can be switched on to enforce the relay only candidate generation by clicking on the button provided at the right most side of the flag. This is how it will look like after switching it on.

ree

Any WebRTC application, using Firefox after this flag is switched on will use relay type candidates only. Here is how the candidate logs of Firefox looks like while a WebRTCcall is running after this flag is switched on.

ree

As it can be seen in the above image, Firefox has chosen relay-tls candidate type in this case as the only turn server credentials provided for this example turns:turn.centedge.io:443 is enabled with ssl certificate based authentication. More than one turn server credentials can also be provided so that the TURN server can take a decision to choose the correct relay candidate type based on the users firewall configuration. An ideal iceServers parameter for establishing a WebRTCconnection should have a STUN server and multiple turn server entries different purposes. It should something like this.

ree

As it can be seen there are 4 entries in this case. The explanation for each entry is as below.

Entry 1: For generating srflx candidates using a STUN server when both the users are not behind a firewall. This is the most common and most used method to to connect a WebRTC call. A STUN server is a must for connecting a WebRTC call. Also there are no costs related maintaining a STUN server in terms of data transfer costs except for the cost of the server itself.

Entry 2: For generating relay candidates as there is a firewall in place for either one or both the users of the WebRTC application. Here the turn server is reachable on port 80 because most of the firewalls allow access on port 80 which also is used by http.

Entry 3: For generating relay candidates as there is a firewall in place and all access are blocked by it except authentication based access. For this a ssl certificate based authentication is used to pass through this firewall restriction. This firewall is stricter than the previous one. Hence port 443 is used which also is used by https.

Entry 4: For generating relay candidates of type tcp as there is a strictest possible firewall in place which not only needs a authentication based access but also has blocked udp ports.

This is even a stricter firewall than the previous one. This type of candidate should be able to pass through al kind of firewalls as All most all firewalls allow port 443 access for normal internet access to websites and without opening port 443, they won’t work!

While working with one of our customers, we realized that testing the TURN server settings one by one for all possible candidate types is critical for any application for reducing call failures due to strict firewall policies. Except for trickle ICE, there are no reliable tools available today to test TURN servers primarily from the perspective of what kind of relay candidates(udp/tcp/tls) this turn server supports. Therefore, you can use our open-source p2p application to test it in your local network or deploy the application to test it over the internet. We will try to have a hosted version of the same p2p application in some time.

How does it work?

Step 1: Download this repo to your local machine and follow clearly mentioned instructions in the read me to set it up. Don’t forget to provide your TURN server credentials you wish to test as mentioned in the read me.

Step 2: Once the setup is successful, open Firefox, change the settings as mentioned above switch on the relay-only flag.

Step 3: Copy and paste the generated link either on a different browser on the same device or on a different device in the same network. You can share the link with your friend as well to hang out with him while testing if you choose to host the application in a cloud provider like DigitalOcean.

Step 4: Once the other user joins the link, press the call button to start the video chat.

Step 5: Open the about:webrtc in a different tab in Firefox, clear the history using the Clear History button so that the statistics are available for the current call only and open the show details and then click on the show details link to view the detailed call statistics including the ice candidate details. Here you should be able to view the relay candidate type(udp/tcp/tls) along with all other relevant details.

Hope this helps you in testing your turn server for misconfigurations. Feel free to drop us an email at hello@centedge.io for any questions/doubts/concerns related to the above post or overall TURN servers/ WebRTC.