Autoscaling WebRTC apps are not at all easy. A lot of discussion on building large-scale WebRTC apps gets stuck on how to scale. There are no straightforward answers available to this question yet. For this reason, we at CentEdge have developed CWLB, a general-purpose WebRTC load balancer using mediasoup as the media server at its core.
When a customer connects with us to help them build WebRTC apps, the conversation goes something similar to this.
Customer: We want to integrate video conferencing capabilities in our existing web app.
Us : Sure. We can help you with that.
Customer: Our requirement is to have 15 person(max) conferencing rooms with recording capabilities.
Us: Sure. We can help you with that as well.
Customer: We want the solution to be super scalable so that even a million rooms can be started at the same time. We have our own data centre and you can run Kubernetes clusters there. We hope this will be fine for scaling requirements.
Us: No. Kubernetes clusters may not be sufficient to scale WebRTC apps considering the stateful nature of them. Also memory and cpu usage may not be the right indicators for indicating server load in this case.
Customer: What is stateful nature? Why memory and cpu usage are not the right indicators?
Us:….
The conversation goes on where we make our customers fully understand the nature of WebRTC calls and the media server parameters which indicate correctly the current load. Towards the end of the conversation, this question of how should the scaling problem be solved, used to remain open for further discussions as we did not have a ready-made answer for this.
After going through a similar conversation several times, we decided to do something about it. 1st June 2021 is when we started working on a general-purpose load balancer to auto-scale WebRTC apps. After a year of considerable effort, we have successfully developed the load balancer to auto-scale WebRTC apps. We call it CWLB, which stands for Centedge WebRTC Load Balancer. CWLB supports both horizontal as well as vertical autoscaling. Mediasoup is the media server used behind the load balancer to scale WebRTC apps and it currently supports AWS as the cloud provider to create/delete on-demand mediasoup media servers.
Before moving on to discuss more CWLB, we will elaborate on some keywords which we mentioned in the above para for a better understanding of the context.
Why Autoscaling?
The first important question is, why does one need autoscaling? Because one needs more video rooms simultaneously which is beyond the capabilities of a single media server. Let’s look at an example.
A c5.2xLarge instance of aws (8vCPU & 16Gb RAM) can handle either one large 50-person conference room or 10 small 5-person conference rooms. Once the server is on full load, it can’t cater to any new room creation requests until the rooms running in it are closed. One option is to run multiple simultaneous servers to handle more load irrespective of whether new room requests are coming or not. In this case, it will be huge wastage of resources as one has to pay the server bills while the servers are idle most of the time.
There may also be instances where servers may be required only at a specific time but not all the time. In this case, one needs to manually create new servers just before they are needed and create a mechanism to route new room creation requests to the newly created servers. Once the need is over, again the servers are needed to be shut down and closed manually. This is still okay if the demand for video room creation is predictable as one will get the time for the creation of new servers but it is nearly impossible if the room creation time s highly unpredictable. An example of a predictable load is a church prayer service that happens every day at the same time or a scheduled board meeting that happens every week / every month on a specific date and time. These kinds of services give one ample time to create new servers to cater to these prescheduled demands. An example of an unpredictable service is a teaching-learning app where any teacher can log in at any time to start a room. In this case, the room creation requests are so random that one won’t get any time to create new servers. Therefore it is impossible to scale manually in case of an unpredictable load.
To solve the above-mentioned problems, a load balancer is used in front of the media servers whose job is to distribute the incoming load among the available servers based on a predefined algorithm. If no more media servers are available, then create new ones. If some of the media servers are idle, then delete them so that valuable resources can be saved. A load balancer is a must to cater to unpredictable load scenarios. Also, it is good to have for predictable load scenarios because it saves a lot of manual effort while minimizing the chance of error happening from the manual effort.
Is Autoscaling mandatory?
No. It is not mandatory for all kinds of WebRTC applications. When an application has a finite amount of load and also the load is predictable, then autoscaling may not be needed in this case.
Example:
If you have a small school with 100 students in 5 grades which makes 20 students (approx.) in each grade. In this case, an 8vCPU X 16GB RAM server running for 24 X 7 should be economical as well as sufficient enough to handle the peak load of all 5 grades running their classrooms simultaneously. For this kind of use case, adding a load balancer will add a lot of complexity and cost rather than saving it.
Why mediasoup?
Because mediasoup is one of the most capable media servers available out there today with high-performance metrics. It has many cutting edge features like
Simulcast & SVC
Congestion control
Multi-stream (ability to send multiple streams over a single peer connection)
Sender & Receiver side bandwidth estimation
A tiny Nodejs module for easy integration with existing large Nodejs applications
super low-level APIs to provide minute control over media stream flows
Features like ice restarts and prioritization provide application flexibility
We have used the majority of the capabilities provided by mediasoup in our load balancer to provide enough flexibility to our customers who will be using our load balancer to build their super scalable applications on top of it.
Why aws?
Because aws is the leading cloud provider today it is used by many enterprises, and startups as well as individuals for hobby projects. It also has best-in-class uptime and trust among its users. It has very elaborated and easy-to-follow documentation for developer adoption. Also, their critical APIs which are used by our load balancer to scale media servers, are stable with less change frequency. For all of the above reasons, we choose aws as our first cloud provider for CWLB. We will eventually plan to support all leading cloud providers including Microsoft Azure, Google Cloud, Oracle Cloud, Digital Ocean, OVH cloud, etc., once our aws offering is complete and stable.
There can be 4 possible strategies using which one can auto-scale a webrtc application.
Horizontal scaling
Vertical scaling
Hybrid scaling
Hybrid+ scaling
Horizontal Scaling
This is the suitable mode of scaling if your use case needs smaller meeting room sizes of 2-5 users in each but a lot of such rooms are needed simultaneously.
A good example will be of a video contact center where 100+ customer support agents attend daily calls from customers. It is primarily an one to one call between the agent and the customer until the agent’s supervisor and /or manager decide to join the call. In this case, there will be a maximum of 4 users in the conferencing room at any point in time but there will be 100+ / 500+ such rooms running at any point of time.
In this case horizontal load balancer can be used to distribute the load from first media server to second media server as soon as the load on the first server reached it’s peak. The load balancer would keep track of the real time usage and release resources whenever the load on first sever is reduced. This way the load balancer can upscale / downscale media server resources based on the real time load.
Vertical Scaling
This is the suitable mode of scaling if your use case needs larger meeting room sizes of 20 – 60 users in each but a smaller number of such rooms are needed simultaneously.
A good example will be of a school / educational institution where only 10 teachers conduct daily sessions for their respective classes. In this case, though relatively there will be more number of students in each of the sessions but a maximum of 10 such rooms for 10 teachers need to be run at any point in time.
In this case a vertical load balancer can be used to distribute the load from the first core of the media server to other available cores as soon as the load on the first core reached it’s peak. In this case, though only one media server maybe sufficient to cater to the whole school but effectively distributing load between all the available cores of the media server will be key to achieve the desired output from the media server. Here the load-balancer’s job would be to keep track of the real time usage and release resources whenever the load on each individual core of the media server is reduced.
The two load balancing strategies mentioned here are the two basic forms of media server load balancing in WebRTC. The other two approaches are advanced uses cases which needs more advanced load balancing with fine grain control. They are described in the second part of this blog series. the link to the 2nd part of the post is here.
CWLB
Introducing CWLB (Centedge WebRTC Load Balancer), a general-purpose WebRTC load balancer designed using mediasoup as the media server. It has been designed from scratch to cater to the demands of those enterprises who don’t want to use a video API vendor for certain reasons but want to use a dependable managed video infra with a dedicated support team, along with the possibility of customization of even the core media flows.
Features
Mediasoup as the media server
AWS/DigitalOcean as the cloud provider
Hybrid+ scaling
Highly flexible yet resource-efficient
An advanced load distribution algorithm with 85% efficiency (approx.)
Note: Currently with the CWLB v2 release, the efficiency of CWLB is 85%(approx.). Our goal is to reach >90% efficiency by the v3 release of CWLB.
Now we also have a production grade scalable in-house video conferencing solution named Meetnow on top of CWLB. It has been designed to truly unify your organization’s external and internal communication in the today’s remote first world. Some of the unique features are 2- 100 user room with different modes of one to one, conferencing and event, Complete meeting and attendance analytics, and last but not the least, pay only for real usage without any monthly / yearly commitments until you are sure about switching on to our Enterprise plan.
If you have mediasoup based open source project like mediasoup demo or edumeet which currently works great but does not autoscale then this is for you. If you have a BBB(bigbluebutton) / jitsi implementation currently in production which does not autoscale then this is for you. If you have any other open-source/custom-built video implementation in production which doesn’t autoscale, then this is for you. Even if your current production video setup is working fine but you may need something like this in near future Or you are just curious to know more about CWLB, feel free to drop us a note at hello@centedge.io / sp@centedge.io to know more about how we can help you. If you wish to schedule a free 30 mins discussion for your use case with one of our senior/principal consultants, feel free to do so using this link.
” How to become a WebRTC developer? “is the question many developers ask these days who want to learn this niche technology. The situation was not like this exactly 2 years ago and it was the technology of hobbyists and a very limited set of professionals working in video conferencing enterprises. Then COVID happened and the world started to run on video as strict social distancing measures along with lock downs were enforced through out the world. Everything started from shopping, learning, health checkups even marriages and funerals started to happen in an online video conferencing mode. WebRTC is the technology that made it possible to create video applications for all of these unique use cases. The demand for WebRTC developers skyrocketed since then, where as there are only a handful of skilled WebRTC developers are out there even today.Many enterprises / startups are going through a lot of pain these days to find good WebRTC developers.
As an early adopter and pioneer in the field of WebRTC, we also face a very similar situation, while trying to hire good WebRTC developers. In order to solve this problem, we will be starting to organise weekly / monthly events for students / experienced professionals where they will be briefed about the benefits of WebRTC as technology stack and how it can help them start / restart their career if they understand and master it.
If you are a student / experienced professional with < 5 years of experience/ then this online event is for you. The details are as below.
Please join the above slack link and then join the#webrtc-developers-den channel to get notified about the exact timing and meeting link to join the session. Here we also will post the learning materials, open source projects, developer/tester requirements from time to time.
PS: At Centedge we are working on a cutting edge virtual event platform which is currently in the testing environment. If you are interested in helping us test the platform thoroughly, then don’t forget to join the #testing-captains slack channel as well.
We are looking forward to host you in the coming Saturday to help you become a good WebRTC developer. Feel free to ask us any question / doubt in slack, once you join the #webrtc-developers-den
They say the customer is always right. But this is not always true in case of building a production grade WebRTC video calling app which also can scale.
Why? Because the customer many times wants to build a world class video app like zoom / google meet within a time period of 3 months with a budget of $$$$ / 1$$$$.
How do they come to such conclusions about time and money?
They came to such conclusions because they read it on the internet that WebRTC is free as well as opensource and one can build an app like using zoom /google meet using WebRTC by downloading a random open source package with WebRTC as a keyword, from Github. They concluded afterwards that all things needed for a app like gmeet are available, either in WebRTC or in the open source package. They just need to build a new UI layer and a dashboard around WebRTC to challenge zoom / gmeet!
A notion that WebRTC is free and open source, things like gmeet can easily being built using it without much effort, is slowly built in mind of the customer. This notion is creating a lot of confusion between companies like us and the prospective customers. It takes us time to make the prospective customers understand the reality about WebRTC and the effort it takes to build production grade video applications using WebRTC.
Once the initial understanding is built that building a live video application is more than simple WebRTC, another issue arises. This time the issue is about building large enough video rooms which can possibly cater to may be a million users. A million users in one Room!
Again we need put efforts to understand the customers thought process by asking the right set of questions. It turns out that the customer currently has a teaching learning application where one teacher teaches one student at a time. They started by using gmeet for free to conduct such sessions but later they realized that they need more control along with deep integration of the calling feature to their dashboard. That’s why they are currently looking for an alternative which can provide deep integration to their dashboard while being cost effective. But they anticipate that their product will have explosive growth and will reach a million students within couple of years. That’s why they want to build a large enough video conferencing application which can scale to million users when it happens, in a couple of years.
Here again we need to put efforts to make our customers understand how a WebRTC application really works. We need discuss and explain to them the various kind of WebRTC architectures like p2p, full mesh multiparty, conferencing, live streaming, Webinar etc. Though we prefer to not to use much WebRTC jargons in the discussion, some time it becomes unavoidable when we need top explain them things like SFU, MCU, ICE/STUN/TURN, Media server, Recording Server, FFMPEG, GStreamer etc. After one or two rounds of discussions, they themselves realize that their current need can be very well satisfied in a p2p call occasionally with a TURN server. After all these discussions, they understand that building a production grade scalable p2p video call, takes much less time and resources than building a scalable production grade video conferencing application. It is a also a good starting point to test a market and the product, before investing more resources in building a scalable video conferencing application. It is immaterial that they choose a P2P app or conferencing app, in a couple of rounds of discussions, they equip themselves with all the necessary knowledge to understand the reality with WebRTC. From here on-wards, it becomes a rewarding experience to help the customer achieve his /her business objective.
After going though the above mentioned situation for a couple of time, we decides to build a scalable production grade WebRTC p2p video calling application with a loosely coupled UI. This way UI can be modified according to individual needs where as the architecture, the front end functionality and the back end stays the same. Though building a simple p2p app seems easy but building a scalable, production grade p2p video calling app with certain level of bad network tolerance is not so straight forward. Why? Because one need to take care of the below mentioned things in a production grade app which are generally not present in a simple p2p app available on Github.
Audio / Video management: This feature includes all possible things a user may need while using the app like muting / un-muting the mic, switching on / off the camera, changing of existing mic / camera to a new mic / camera for rest of the call, allowing moderator controls for remote media input change( so that a teacher can mute / un-mute the mic or switch on /off the camera of student in case of a need ) etc.
Capturing images / statistics: This feature helps in capture an image from the real time video stream for a purpose like vKYC(video Know Your Customer). With this feature, an bank agent can capture the image while the ban’s customer shows his / her photo identity card during the call for bank’s verification purposes. Collecting real time call and network statistics are also important for quality control and monitoring purposes. Also real time network monitoring can raise timely alerts to users when their network quality degrades.
Auto re-connection: This is primarily important for maintaining the call even when the network fails. A network typically fails when one’s device changes the network connection while a call is going on. It happens when a users’ device switches between WiFi and mobile network modes like 3G/4G / LTE etc. The network temporarily fails when the switch happens and comes back once the switch is over. In order to auto reconnect, an application need to detect network failure, wait until the network reconnects and restart the media communication as quickly as possible after the successful re-connection. In the WebRTC terminology, restoring the media communication is called ICE restart.
Integrations: Other application integrations like whiteboards for collaboration, text chat option, file sharing etc also play an important role for some users. Either these features should be there or a provision should be there for the integration of these features in case of a need.
Recording : Recording is another important feature in any WebRTC application. Though it may not be needed in all kinds of WebRTC applications, it becomes necessary for applications like video call centers, video health etc. Recording can happen either in client side or server side. Ideally a server side recording is preferred as it allows to post process the raw recording for multiple purposes. As an example, a video recorded in WebM format, the default recording format for WebRTC, consumes 800 MB – 1000 MB of space to store an hour long video recording which is a lot. In the server side, one can use a tool called FFMPEG to compress it, watermark it and convert it to MP4 format which can reduce the size to < 100MB for the same video. Once can use client side recording and send the file to server once the recording is done as an alternative strategy if server side recording is not possible( like a P2P call).
In call Media Manipulation: There may be a need for masking some portion of the video stream while in call for either for security or convenience purposes. A widely used feature these days for such a need is called background removal, where the background of a user sharing a camera is either blurred or replaced with another image of a coffee shop / office desk / meeting room etc. There may be other such use cases as well.
All of the above mentioned features along with a robust architecture ready for scale makes a production grade application. It takes a lot of efforts and time to build such an application with an excellent team with deep understanding of WebRTC and related technologies, frontend, backend, and many other such things.
If you are a customer looking forward to build a production grade scalable video calling app, then by now you know that you need to have a rock star team with solid understanding of WebRTC and related technologies along with sufficient time and resources at your disposal to venture on such an adventure. If you don’t have a rock star team or time at your disposal, then we are here to help in any of below mentioned way.
CP2P is our scalable production grade P2P video calling app ready for production deployment as a managed service. It comes with a very minimal UI ready for retro fitting. Either you can share your UI design and we build it for you or share the fully designed UI for integration. We can integrate , deploy in your servers / our servers and manage it for you in a cost effective manner. The link to view details and check it in action is there at the bottom of this page.
CVR is our scalable production grade video conferencing / live streaming / Webinar application ready for production deployment as a managed service. It uses our in house built from scratch WebRTC load balancer CWLB to distribute and balance load in real time with a utilization efficiency of 75%. It also uses CR, an in house advanced recording engine developed from scratch to record meetings. It uses Mediasoup as it’s core media server. The link to know more about it and schedule a demo is at the bottom of this page.
If you think that you don’t want to use any of these products, but want o develop it from scratch, we as an consulting company can help you build your own product from scratch. In this case, you need more resources and time then the previously mentioned options. If you have more resources and time at your disposal, then this can be a path to trade. The only thing to make sure in this case is that you have a dependable rock star team who can work with us for building the product.
In case you don’t have a rock start team, then there is a reason to worry. But why worry when we are here. We have an instructor led online / offline, full time training program where we can convert any fullstack / frontend / backend developer with sound javascript knowledge to rock star WebRTC developer. The timelines for the training program are as below.
5 – 7 days (for the WebRTC fundamentals program)
10 -14 days ( for the WebRTC fundamentals and advanced WebRTC with Mediasoup program)
I hope I was able to share enough information about building a scalable production grade video conferencing app. If you still have doubts or questions, you can reach out to me either on sp@centedge.io or on hello@centedge.io.
The link to schedule a free 30 mins discussion with one of our experts to resolve all your WebRTC related queries is here.
If you are student / developer looking forward to learn more about basics of WebRTC by yourself through working examples, here is a github repo with working examples on successful MediaStream acquisition, building a basic signalling server, and building a working P2P call app.
So you are a developer(frontend/backend/full-stack) curious about developing applications using WebRTC. You are searching over the internet for the last couple of days or even months to learn the basics and to build a basic WebRTC video calling app along with a basic understanding. Though there are a few Github repositories available with the code for building a very basic p2p video calling app, none of them have the details about the inner workings of the code. The code in those repositories just runs in your localhost with a command or two where you can connect 2 tabs of your browser with a video call. When you try to read through those codes, you find a bunch of API calls that are very unfamiliar and sometimes even illogical.
In order to demystify the inner workings of a basic video calling app using WebRTC, we need to follow a 3 step beginner-friendly approach which also is commonsensical. Let’s start.
The very first step of building a video calling app is to understand how to acquire the camera and/or mic of the device you want to use for calling using any of these browsers, chrome/firefox/edge/safari. It can be a desktop/laptop / mobile device as long as any one of these browsers is present. Without the camera and mic, the video call has no meaning at all. There can be a use case where you are going to use WebRTC in the p2p mode with data channels for file sharing only but we are not going to discuss this in this blog post today. The way to acquire the camera and mic from the browser is to use an API called getUserMedia. The below line of code will acquire the camera and mic from the browser.
With the above line of code, we will be able to acquire the camera and mic with some preconditions. The above line of code won’t work if you are not running on HTTPS. If you try to use the above line of code with HTTP, it will fail.
If you have successfully acquired the camera and mic, you are ready for step 2 of building a WebRTC video calling app. In this step, you need to build a simple signaling server so that some messages can be exchanged between both the caller and callee. This step is all about building the capability to create a server-side application that will connect to both the caller and callee, and let them share some secret messages with each other whenever needed. Nodejs is the server-side framework that is going to be used as a signaling server in this example and WebSocket as the connectivity mechanism to connect both the users.
With the above lines of code, one has a basic signaling server ready to listen to messages from the caller/callee.
Now you are ready for building the real video calling application using the work we did in the last 2 steps. Here are the steps that are needed to establish the call.
The Caller (peer A) connects to the signaling server and waits for the callee(peer B)
The Callee peer B connects the signaling server and also informs peer A that he /she is available for a call
Peer B clicks the call button and boom! the call is connected where both peer A and peer B can see and listen to each other.
Here are the real steps that happen behind the scenes to establish the call.
Peer B first creates a new PeerConnection object while passing the available ICE servers as a parameter, which helps in sending and receiving the media streams.
const pc = new RTCPeerConnection({iceServers});
Then it acquires the local camera and mic and adds those camera and mic tracks to the PeerConnection. This will make the PeerConnection ready to send the media feeds as soon as the connection is established, i.e. when both the user agree to use a common network configuration acceptable to both)
Then it creates an offer to generate an offer SDP(session description protocol) which contains a large number of information (approx. 80 -100 lines of information) in plain text format. It contains information like network settings, available media stream(audio/video/screen share/anything else), codecs currently available to encode and decode media data packets, and many other things.
const offer = await pc
.createOffer()
.catch(function (error) {
alert("Error when creating an offer", error);
});
Once SDP is generated, the local description of PeerConnection is set using the offer. In simple terms, it asks the browser for the final confirmation of the validity of all the options available in SDP. Once the local description is set, the SDP aka settings can’t be changed anymore and the SDP is then sent to remote peer A to let its browser do all the things that peer B’s browser just did.
await pc.setLocalDescription(offer);
//send the offer to peer A using the signalling channel
As soon as the local description is set , it starts generating ice candidates( in simple terms, the current network configurations of peer B) and sends it to peer A to check if the network parameters are acceptable to his / her device to receive media streams.
pc.onicecandidate = function (event) {
if (event.candidate) {
//send the ice candidate to the other peer using the
//signalling channel
}
};
Once the SDP is received by peer A’s browser sent via the signaling server, peer A first creates a PeerConnection object while passing the ICE servers as a parameter for the same purpose. As soon as the PeerConnecion is created, it uses the offer SDP provided by peer A to set its remote description. This is needed to be done to let the browser know of the other peer’s details so that the browser can create an answer SDP as an answer to the offer at a later stage.
const pc = new RTCPeerConnection({iceServers});
pc.setRemoteDescription(new RTCSessionDescription(offer));
It is a repeat of step 2 for peer A where it acquires his / her own media streams and adds them to the Peerconnection to be ready to send once the connection is established.
Then It creates an answer by calling the create answer API on the PeerConnection object and generates the answer SDP. Once the answer SDP is generated, the local description is set on peer A’s side to ask the browser for one final confirmation. Once confirmed, the answer is then sent to peer B via the signaling channel for peer B’s browser’s acceptance of this answer.
const answer = await pc
.createAnswer()
.catch(function (error) {
alert("Error when creating an answer", error);
});
await pc.setLocalDescription(answer);
//send the offer to peer A using the signalling channel
Once the answer SDP is received on user B’s side, it calls the set remote description API to ask the browser for acceptance of the other user’s SDP. Once the browser confirms, the connection for media transport is now established.
pc.setRemoteDescription(
new RTCSessionDescription(answer)
);
Step 5 is repeated by peer A’s browser for the exact same purpose. Both the browsers have knowledge of each other’s network configuration by now. After this, both of the peers agree to use one network configuration among all the possible network configuration options given by both of their browsers. The mutually selected network configuration aka ice candidate is then used for the actual media transport between both of the users.
pc.onicecandidate = function (event) {
if (event.candidate) {
//send the ice candidate to the other peer using the
//signalling channel
}
};
Once the connection is established, each of the PeerConnection objects starts sending their respective media streams to the remote user. As soon as the media reaches the other side, an event named ontrack is triggered on the PeerConnection object to let the browser know that other peer media has already reached and is ready to be consumed. The local browser then extracts the media from its PeerConnection object and displays it in a video element.
pc.ontrack = (event) => {
if(event.streams && event.streams[0]){
//The remote stream is now available at event.streams[0]. It
//can be attached to the srcObject of a video element to
//display the remote stream to the peer.
}
}
Now the call is successfully established where both peer A and peer B can communicate with each other in real-time with their respective camera and mic.
Once all the above-mentioned steps are done correctly, a WebRTC video call can be established successfully. Here is the link to the Github repo where all the above steps are created in separate folders along with working code for your reference.
Do keep in mind that this is for learning and understanding the inner workings of how a simple WebRTC p2p call works. If you want to build a production grade p2p call which you can deploy to a cloud and use it for a commercial venture, you need to check this out.
If you want to build a production grade video calling app by yourself as an extension to this project, you need to check this post learn more about all the necessary features in a production grade app.
Also keep in mind that you need a robust architecture to build a production grade app. The code in the Github repository created for this example, has been created for learning purpose only and is not fit for production usage. If you are interested in scheduling a discussion with a principal consultant at Centedge to do the right architecture for you, you can schedule a free 30 mins consultation cal using this link
A TURN server is a need for any WebRTC application when a user behind a strict firewall needs to connect to your application. Co-Turn is a very popular open-source TURN server that can be used with any WebRTC application for bypassing firewalls. It can be installed on a Linux server(on-premise / cloud ) and need to be properly configured to work with your WebRTC application. As the scope of this post, we are going to focus only on how to test your TURN server to find out if all the configurations are done properly or not as the configuration of a TURN server deserves a separate post.
The most popular tool available today to test a TURN server is known as the WebRTC trickle ICE testing tool. The link to the site is at the bottom of this post. Here is what it looks like.
As shown, we can add our STUN/TURN server credentials using the add server button. Once the server is added, we can select the server and gather all possible ICE candidates using the gather candidates button. Here is what it looks like for when a generate candidates for working a TURN server.
As shown, there are 3 kinds of candidates being generated from one candidate gathering request. They are
host candidates
srflx candidates
relay candidates
host candidates: Host candidates are those candidates which can be used to connect a WebRTC call within the same network, i.e. both the peers joining the call are connected to one network router / switch.
srflx candidates: srflx candidates are those candidates which can be used to connect a WebRTC call when both the users are not in the same network but on different networks may be even in 2 different continents connected by the Internet.
relay candidates: relay candidates are those candidates which can be used to connect a WebRTC call when both users are not in the same network and either one user is or both the users are, behind a firewall. When a user is behind a firewall, he/she is not reachable directly from the Internet. That’s why the firewall has been put in place so that a network can be isolated from the Internet and interaction with the Internet can be controlled by a network administrator in a desired manner. The primary reason is to keep the evil eyes out of the network.But this network arrangement poses a problem to a WebRTC app as it can’t reach to a user. In order to solve this, a technique is used where a TURN server is put in place to let both the users use it as a relay server to relay their media through the server rather than directly connecting with each other, in order to bypass the firewall. If this sounds a bit confusing, you can read more about firewalls and trickle ICE to gain a better understanding.
There can be 3 kinds of relay candidates. Any one of these candidate types can be chosen to establish the connection.
relay-udp
relay-tcp
relay-tls
relay-udp: The most common kind of relay candidates and the easiest to connect one. When the firewall is not very strict and allows udp connection to happen, then this type is chosen.
relay-tcp: This is the next best available option when the relay-udp candidate type is not possible as udp ports are blocked by the firewall. In this case the relay-tcp candidate type is used.
relay-tls: This is the next best available option as the firewall has not only blocked the udp ports but also blocked unsecured connections over tcp. Only a secured connection with a valid authentication mechanism like ssl certificates can pass through this firewall. In this case, the relay-tls candidate type is used which has ssl certificate based authentication mechanism already enabled at the TURN server end.
To test which kind of relay candidate a TURN server is using, we can use the Firefox browser with a special setting flag switched on. Here is how it looks like.
As shown, there is a flag in Firefox to tell the browser to use only relay type connections for any kind of WebRTC application used in the browser. By default, it is set to false which means only relay type connection is not enforced for WebRTC calls. This flag can be switched on to enforce the relay only candidate generation by clicking on the button provided at the right most side of the flag. This is how it will look like after switching it on.
Any WebRTC application, using Firefox after this flag is switched on will use relay type candidates only. Here is how the candidate logs of Firefox looks like while a WebRTCcall is running after this flag is switched on.
As it can be seen in the above image, Firefox has chosen relay-tls candidate type in this case as the only turn server credentials provided for this example turns:turn.centedge.io:443 is enabled with ssl certificate based authentication. More than one turn server credentials can also be provided so that the TURN server can take a decision to choose the correct relay candidate type based on the users firewall configuration. An ideal iceServers parameter for establishing a WebRTCconnection should have a STUN server and multiple turn server entries different purposes. It should something like this.
As it can be seen there are 4 entries in this case. The explanation for each entry is as below.
Entry 1: For generating srflx candidates using a STUN server when both the users are not behind a firewall. This is the most common and most used method to to connect a WebRTC call. A STUN server is a must for connecting a WebRTC call. Also there are no costs related maintaining a STUN server in terms of data transfer costs except for the cost of the server itself.
Entry 2: For generating relay candidates as there is a firewall in place for either one or both the users of the WebRTC application. Here the turn server is reachable on port 80 because most of the firewalls allow access on port 80 which also is used by http.
Entry 3: For generating relay candidates as there is a firewall in place and all access are blocked by it except authentication based access. For this a ssl certificate based authentication is used to pass through this firewall restriction. This firewall is stricter than the previous one. Hence port 443 is used which also is used by https.
Entry 4: For generating relay candidates of type tcp as there is a strictest possible firewall in place which not only needs a authentication based access but also has blocked udp ports.
This is even a stricter firewall than the previous one. This type of candidate should be able to pass through al kind of firewalls as All most all firewalls allow port 443 access for normal internet access to websites and without opening port 443, they won’t work!
While working with one of our customers, we realized that testing the TURN server settings one by one for all possible candidate types is critical for any application for reducing call failures due to strict firewall policies. Except for trickle ICE, there are no reliable tools available today to test TURN servers primarily from the perspective of what kind of relay candidates(udp/tcp/tls) this turn server supports. Therefore, you can use our open-source p2p application to test it in your local network or deploy the application to test it over the internet. We will try to have a hosted version of the same p2p application in some time.
How does it work?
Step 1: Download this repo to your local machine and follow clearly mentioned instructions in the read me to set it up. Don’t forget to provide your TURN server credentials you wish to test as mentioned in the read me.
Step 2: Once the setup is successful, open Firefox, change the settings as mentioned above switch on the relay-only flag.
Step 3: Copy and paste the generated link either on a different browser on the same device or on a different device in the same network. You can share the link with your friend as well to hang out with him while testing if you choose to host the application in a cloud provider like DigitalOcean.
Step 4: Once the other user joins the link, press the call button to start the video chat.
Step 5: Open the about:webrtc in a different tab in Firefox, clear the history using the Clear History button so that the statistics are available for the current call only and open the show details and then click on the show details link to view the detailed call statistics including the ice candidate details. Here you should be able to view the relay candidate type(udp/tcp/tls) along with all other relevant details.
Hope this helps you in testing your turn server for misconfigurations. Feel free to drop us an email at hello@centedge.io for any questions/doubts/concerns related to the above post or overall TURN servers/ WebRTC.