Autoscaling WebRTC with mediasoup

Autoscaling WebRTC with mediasoup

Autoscaling WebRTC apps are not at all easy. A lot of discussion on building large-scale WebRTC apps gets stuck on how to scale. There are no straightforward answers available to this question yet. For this reason, we at CentEdge have developed CWLB, a general-purpose WebRTC load balancer using mediasoup as the media server at its core.

When a customer connects with us to help them build WebRTC apps, the conversation goes something similar to this.

Customer: We want to integrate video conferencing capabilities in our existing web app.

Us : Sure. We can help you with that.

Customer: Our requirement is to have 15 person(max) conferencing rooms with recording capabilities.

Us: Sure. We can help you with that as well.

Customer: We want the solution to be super scalable so that even a million rooms can be started at the same time. We have our own data centre and you can run Kubernetes clusters there. We hope this will be fine for scaling requirements.

Us: No. Kubernetes clusters may not be sufficient to scale WebRTC apps considering the stateful nature of them. Also memory and cpu usage may not be the right indicators for indicating server load in this case.

Customer: What is stateful nature? Why memory and cpu usage are not the right indicators?

Us:….

The conversation goes on where we make our customers fully understand the nature of WebRTC calls and the media server parameters which indicate correctly the current load. Towards the end of the conversation, this question of how should the scaling problem be solved, used to remain open for further discussions as we did not have a ready-made answer for this.

After going through a similar conversation several times, we decided to do something about it. 1st June 2021 is when we started working on a general-purpose load balancer to auto-scale WebRTC apps. After a year of considerable effort, we have successfully developed the load balancer to auto-scale WebRTC apps. We call it CWLB, which stands for Centedge WebRTC Load Balancer. CWLB supports both horizontal as well as vertical autoscaling. Mediasoup is the media server used behind the load balancer to scale WebRTC apps and it currently supports AWS as the cloud provider to create/delete on-demand mediasoup media servers.

Before moving on to discuss more CWLB, we will elaborate on some keywords which we mentioned in the above para for a better understanding of the context.

Why Autoscaling?

The first important question is, why does one need autoscaling? Because one needs more video rooms simultaneously which is beyond the capabilities of a single media server. Let’s look at an example.

A c5.2xLarge instance of aws (8vCPU & 16Gb RAM) can handle either one large 50-person conference room or 10 small 5-person conference rooms. Once the server is on full load, it can’t cater to any new room creation requests until the rooms running in it are closed. One option is to run multiple simultaneous servers to handle more load irrespective of whether new room requests are coming or not. In this case, it will be huge wastage of resources as one has to pay the server bills while the servers are idle most of the time.

There may also be instances where servers may be required only at a specific time but not all the time. In this case, one needs to manually create new servers just before they are needed and create a mechanism to route new room creation requests to the newly created servers. Once the need is over, again the servers are needed to be shut down and closed manually. This is still okay if the demand for video room creation is predictable as one will get the time for the creation of new servers but it is nearly impossible if the room creation time s highly unpredictable. An example of a predictable load is a church prayer service that happens every day at the same time or a scheduled board meeting that happens every week / every month on a specific date and time. These kinds of services give one ample time to create new servers to cater to these prescheduled demands. An example of an unpredictable service is a teaching-learning app where any teacher can log in at any time to start a room. In this case, the room creation requests are so random that one won’t get any time to create new servers. Therefore it is impossible to scale manually in case of an unpredictable load.

To solve the above-mentioned problems, a load balancer is used in front of the media servers whose job is to distribute the incoming load among the available servers based on a predefined algorithm. If no more media servers are available, then create new ones. If some of the media servers are idle, then delete them so that valuable resources can be saved. A load balancer is a must to cater to unpredictable load scenarios. Also, it is good to have for predictable load scenarios because it saves a lot of manual effort while minimizing the chance of error happening from the manual effort.

Is Autoscaling mandatory?

No. It is not mandatory for all kinds of WebRTC applications. When an application has a finite amount of load and also the load is predictable, then autoscaling may not be needed in this case.

Example:

If you have a small school with 100 students in 5 grades which makes 20 students (approx.) in each grade. In this case, an 8vCPU X 16GB RAM server running for 24 X 7 should be economical as well as sufficient enough to handle the peak load of all 5 grades running their classrooms simultaneously. For this kind of use case, adding a load balancer will add a lot of complexity and cost rather than saving it.

Why mediasoup?

Because mediasoup is one of the most capable media servers available out there today with high-performance metrics. It has many cutting edge features like

  • Simulcast & SVC
  • Congestion control
  • Multi-stream (ability to send multiple streams over a single peer connection)
  • Sender & Receiver side bandwidth estimation
  • A tiny Nodejs module for easy integration with existing large Nodejs applications
  • super low-level APIs to provide minute control over media stream flows
  • Features like ice restarts and prioritization provide application flexibility

We have used the majority of the capabilities provided by mediasoup in our load balancer to provide enough flexibility to our customers who will be using our load balancer to build their super scalable applications on top of it.

Why aws?

Because aws is the leading cloud provider today it is used by many enterprises, and startups as well as individuals for hobby projects. It also has best-in-class uptime and trust among its users. It has very elaborated and easy-to-follow documentation for developer adoption. Also, their critical APIs which are used by our load balancer to scale media servers, are stable with less change frequency. For all of the above reasons, we choose aws as our first cloud provider for CWLB. We will eventually plan to support all leading cloud providers including Microsoft Azure, Google Cloud, Oracle Cloud, Digital Ocean, OVH cloud, etc., once our aws offering is complete and stable.

There can be 4 possible strategies using which one can auto-scale a webrtc application.

  • Horizontal scaling
  • Vertical scaling
  • Hybrid scaling
  • Hybrid+ scaling

Horizontal Scaling

This is the suitable mode of scaling if your use case needs smaller meeting room sizes of 2-5 users in each but a lot of such rooms are needed simultaneously.

A good example will be of a video contact center where 100+ customer support agents attend daily calls from customers. It is primarily an one to one call between the agent and the customer until the agent’s supervisor and /or manager decide to join the call. In this case, there will be a maximum of 4 users in the conferencing room at any point in time but there will be 100+ / 500+ such rooms running at any point of time.

In this case horizontal load balancer can be used to distribute the load from first media server to second media server as soon as the load on the first server reached it’s peak. The load balancer would keep track of the real time usage and release resources whenever the load on first sever is reduced. This way the load balancer can upscale / downscale media server resources based on the real time load.

Vertical Scaling

This is the suitable mode of scaling if your use case needs larger meeting room sizes of 20 – 60 users in each but a smaller number of such rooms are needed simultaneously.

A good example will be of a school / educational institution where only 10 teachers conduct daily sessions for their respective classes. In this case, though relatively there will be more number of students in each of the sessions but a maximum of 10 such rooms for 10 teachers need to be run at any point in time.

In this case a vertical load balancer can be used to distribute the load from the first core of the media server to other available cores as soon as the load on the first core reached it’s peak. In this case, though only one media server maybe sufficient to cater to the whole school but effectively distributing load between all the available cores of the media server will be key to achieve the desired output from the media server. Here the load-balancer’s job would be to keep track of the real time usage and release resources whenever the load on each individual core of the media server is reduced.

The two load balancing strategies mentioned here are the two basic forms of media server load balancing in WebRTC. The other two approaches are advanced uses cases which needs more advanced load balancing with fine grain control. They are described in the second part of this blog series. the link to the 2nd part of the post is here.

CWLB

Introducing CWLB (Centedge WebRTC Load Balancer), a general-purpose WebRTC load balancer designed using mediasoup as the media server. It has been designed from scratch to cater to the demands of those enterprises who don’t want to use a video API vendor for certain reasons but want to use a dependable managed video infra with a dedicated support team, along with the possibility of customization of even the core media flows.

Features
  • Mediasoup as the media server
  • AWS/DigitalOcean as the cloud provider
  • Hybrid+ scaling
  • Highly flexible yet resource-efficient
  • An advanced load distribution algorithm with 85% efficiency (approx.)

Note: Currently with the CWLB v2 release, the efficiency of CWLB is 85%(approx.). Our goal is to reach >90% efficiency by the v3 release of CWLB.

Now we also have a production grade scalable in-house video conferencing solution named Meetnow on top of CWLB. It has been designed to truly unify your organization’s external and internal communication in the today’s remote first world. Some of the unique features are 2- 100 user room with different modes of one to one, conferencing and event, Complete meeting and attendance analytics, and last but not the least, pay only for real usage without any monthly / yearly commitments until you are sure about switching on to our Enterprise plan.

If you have mediasoup based open source project like mediasoup demo or edumeet which currently works great but does not autoscale then this is for you. If you have a BBB(bigbluebutton) / jitsi implementation currently in production which does not autoscale then this is for you. If you have any other open-source/custom-built video implementation in production which doesn’t autoscale, then this is for you. Even if your current production video setup is working fine but you may need something like this in near future Or you are just curious to know more about CWLB, feel free to drop us a note at hello@centedge.io / sp@centedge.io to know more about how we can help you. If you wish to schedule a free 30 mins discussion for your use case with one of our senior/principal consultants, feel free to do so using this link.

Autoscaling WebRTC with mediasoup, CWLB 2.0 now ready

Autoscaling WebRTC with mediasoup, CWLB 2.0 now ready

As the first post of 2023, Wishing everybody a wonderful new year 2023.

If you are here, you most probably are facing issues related to scaling with your WebRTC application or you are just exploring with some future plans to build a production grade WebRTC app. In both the cases, you are at the right place. This post is going to be a continuation from the previous post we wrote on this topic a couple of months ago. The previous post described in details regarding when auto-scaling is necessary and when it is not. If you are not sure if your solution needs WebRTC autoscaling or not, you should read the previous post here before reading further.

In the last post we discussed about horizontal OR vertical scaling as a strategic option to scale mediasoup media servers based on the use case. In this post, we are going to discuss about another way of auto-scaling and its use case. We also are going to discuss interesting new enhancements to CWLB.

The third WebRTC scaling strategy

The third approach is a combination of vertical and horizontal scaling combined in as one. It can be called a hybrid scaling approach. Here the vertical scaling approach is used to scale one room to all available cores in a mediasoup instance in case of a need.Once this mediasoup instance is totally occupied, but still the same room needs more resources, the horizontal scaling is used to scale to different mediasoup instance located in separate host. For all the new resource allocation requests for the same room, the new server is then used according to the vertical scaling strategy until and unless the first server has free resources to spare. This hybrid approach is typically useful for very large rooms like large event rooms where the load-balancer needs to cater to 100s / 1000s of concurrent users in one room in a complete just in time resource request mode.

Lets understand the 2 important key words mentioned in the above paragraph.

Resource requests: A request is made to the media server to allocate some resources to the user so that the user can send / receive audio / video / screen-share media streams.

Just in time request: This load-balancer strategy is used when the load-balancer has no previous information about the size of the rooms so that it can pre-allocate and reserve the resources.Here the load-balancer has to work really hard to keep track of real time resource usage of each media server and allocate / free resources in real time as the user joins / leaves a room. This type of implementation is relative complex to a pre-allocation and reservation based load-balancing strategy.

The Hybrid+ WebRTC scaling strategy

The hybrid+ scaling strategy has all the things that is there in the hybrid scaling strategy. In addition, it also has some other important aspects which makes this strategy a really good choice for medium / large scale deployments.

  • An additional relay server between the client and the media server to make a media server completely stateless i.e. the media server will not contain any kind of business logic.
  • Capable of creating / destroying on demand media servers using APIs of cloud providers in a completely automated manner with least manual intervention
  • Capable of utilizing advanced techniques like media server cascading for keeping the latency to the minimum while catering to a global user base. Media servers in different geographic locations need to run simultaneously to enable media server cascading.
  • Capable of HA(High Availability) setup where stand by media servers can take up load when primary media servers fail while in use. Additional standby media servers need to run to ensure HA.

CWLB 2.0

CWLB 1.0 which was released in June 2022 had vertical scaling, horizontal scaling which used AWS EC2 instances for auto-scaling of media servers. This was good enough for small and medium use cases. But for large and very large use cases like large scale event, it had 2 disadvantages. The first is that the load-balancer used to take more media server resources than the number of media servers ideally it should be consuming and the second is that the data transfer costs each room was incurring while using AWS ec2 instances.

In CWLB 2.0 , we have now addressed these 2 points along with many other improvements.

First, the core load-balancer algorithm is now fully JIT request compatible. It means it now uses media server resources very efficiently by keeping track of each media server usage in the real time and allocate / de-allocate resources based on real time user resource consumption demands. It now has all strategies enabled i.e. vertical scaling, horizontal scaling, and a mix of both aka hybrid scaling.

Second, we have integrated another cloud provider, DigitalOcean into the load-balancer which has relatively less data transfer costs than AWS ec2. Lets take an edtech use case as an example to compare the data transfer costs between AWS ec2 and DigitalOcean for your reference so that you can understand why this is important.

Example

A maths tutoring company in India runs online maths tutoring classes for high school students. Here each maths teacher teaches high school maths to1000 students in one online session. They conduct 6 such sessions every day for 6 days a week with each session being conducted for 90mins. Lets try to calculate an approximate data transfer cost for a month. Here we will be using some assumptions to look more realistic.

Lets calculate the amount of data being transferred from the media servers in the cloud to students who have joined the class.

The teacher is speaking while either sharing his/her camera / screen for whole of the class time i.e. 90mins.

Lets assume that the audio is consuming 40Kb/second and the video / screen share is consuming 500Kb/second of internet bandwidth. So each student is consuming 540Kb/second of data.

Here is how the maths looks like.

540 * 60 * 90 = 2.78Gb is what one student consumes for whole 90mins session.

It there are 1000 students in that session, the total data consumption for that session would be 2780.9 Gb or 2.71 Tb.

If there are 6 such sessions happen each day, then the data transfer amount for each day would be 16.29 Tb.

Considering that these sessions happen for 6 days a week, the data transfer amount for the week would be, 97.76 Tb.

Considering 4 weeks in a month, the data transfer amount for the whole month would be, 391.06Tb. That’s a lot of data being transferred!

Now lets look at the cost. AWS ec2 charges $0.08/Gb for outbound data transfers from AWS ec2 o public internet. It essentially means AWS doesn’t charge for the teacher who is sending his / her audio and video streams to the media server but it charges for the students who are listening to the audio and video streams relayed by media server hosted in AWS ec2.

The maths looks like this.

391.06 * 1024 * 0.08 = $32,036

The amount of data consumed per month in Tb which is converted to Gb by multiplying 1024 along with AWS data transfer cost per Gb. This is the cost only for the data transfer and it doesn’t include the cost for running AWS ec2 instances for media servers. That cost will be be added to this cost on the actual usage basis.

Now lets look at the maths for running the same amount of maths tutoring sessions with media servers running on DigitalOcean.

There will be no change on the total amount of data transfers which is 391.06Tb.

The maths will look like this.

391.06 * 1024 * 0.01= $4004

This cost will further come down as there is free data transfer bundled with each DigitalOcean droplets. For example a 4 vCPU , 8GB CPU optimised instance comes with 5TB of free data transfer per month. With DigitalOcean, we can consider the final cost to be in the range of $3200 ~ $3500.

Due to this difference in the data transfer costs, we integrated DigitalOcean into CWLB 2.0 to provide an alternative to AWS ec2 to run media server with lesser cost. But this is purely optional and configurable from the loadbalancer settings of the admin dashboard.

Any organization admin can change their cloud vendor in the dashboard from AWS to DO or vice versa with a button click and the media servers will run the desired cloud as selected by the admin. The default cloud setting for running the media servers is now DO(DigitlOcean).It can be changed to AWS EC2 any time in the loadbalancer settings.

Some other important updates in CWLB 2.0 are as below.

Loadbalancing recording servers

Like media servers, the servers responsible for handling meeting recording can get exhausted quickly if there is a lot of demand for recordings. In order to solve this, we have now integrated recording server autoscaling to the load-balancer. Now the load-balancer can not only auto scale media servers but also recording servers in a fully automated manner.

Loadbalancing breakout rooms

Breakout rooms were already available in CWLB 1.0 but they were not very resource efficient. The customers had to use the same amount of credits to use breakout rooms as the main room. With CWLB 2.0, the breakout rooms are fully integrated in the JIT request handling mode into the load balancer so that the customers need not pay anything extra for using breakout rooms. It’s completely dynamic based on the actual usage of the breakout rooms irrespective of the main room size.

Due to current work pressure, we are not able to write an exhaustive list of all the updates that happened in CWLB 2.0 though we would love to write a exhaustive list when time permits. Until then if you have any query / suggestion related to CWLB 2.0, please feel free to drop us a mail at hello@centedge.io.

Don’t assume your WebRTC DevOps! It can kill

Don’t assume your WebRTC DevOps! It can kill

A real-life incident that happened with one of our customers.

A customer of ours having offices in the US and EU has a nice & innovative video conferencing application with some really cool features for collaborative meetings. They came to us for helping them fix some critical bugs and load balance their video backend. A piece of Interesting information we came to know is that they were running only one media server but a really huge one with 72 cores! The reason for running such a large server was that they wanted a lag-free & smooth video experience for all. In the beginning, when they had a small server, they were facing issues with video quality. Therefore, they took the biggest possible server for consistent video quality for all without even realizing that the video quality issue was due to the server. After digging deep, we made some interesting discoveries about their architecture and suggested some changes to their video infrastructure which includes downgrading the media server to an 8-core media server and having a horizontal load balancer to distribute the load effectively. After the suggested changes, their video infra bill was down by ~80%.

Here is the comparison.

Before:

A 72-core instance in AWS in the EU Frankfurt region costs $3.492/hour which becomes $2514.24 per month.

After:

An 8-core instance in AWS AWS in the EU Frankfurt region costs $ 0.348/hour which becomes $250.56

A horizontal load balancer instance also costs approximately the same, i.e. $250 /month.

So the total becomes $500/ month. A savings of ~80% per month on the cloud server bill!

When the CEO of the company got to know of the media server bill, he was skeptical about the business viability of the service because of the cloud bill that used to be paid every month. After the change, the prospect of the service seems more promising to him for business viability.

Load balancing WebRTC Media Servers, The Need

The rush for creating video conferencing apps is going to stay especially using WebRTC. As WebRTC 1.0 is already standardized by the Internet Engineering Task Force (IETF) by the date this post is being written, it is going to become mainstream in the coming times with the advent of 5G. Having said that, building a video conferencing app still is much more complicated than building a pure web app. Why? Because too many things need to be taken care of to create a production-ready video conferencing app. Those too many things can broadly be divided into 2 major parts. One is to code the app and test it on the local network(LAN ). Once it is successfully tested locally, it is time to take it to the cloud to make it available to a host of other users through the Internet. This is where dev-ops plays a critical role.

Now let’s understand why it is so important.

Let’s assume you have built the service to cater to 50 users in conferencing mode in each room. Now if you have taken a good VPS like c5-xLarge on a cloud provider like AWS, let’s assume it can support up to 10 conference rooms. What will happen if there is a need for an 11th room? In this case, you need to run another server that can handle another 10 rooms. But how will you know when the 11th room request will come? If don’t want to check manually every time a new room creation request comes, then there are 2 options. Either you tell your user who is requesting the 11th room that the server capacity is full and wait until a room becomes free OR create a logic so that a new server can be created magically whenever the new room creation request comes!! Now this situation is called auto-scaling and this is the magical effect of doing proper dev-ops on your cloud provider. The point to note here is that the way you are creating new servers as the demand grows, similarly you have to delete the servers when the demand reduces. Else the bill from your cloud vendor will go over the roof!!

Here is a brief summary of how a typical load-balancing mechanism works. I am not going to discuss the core logic of when to scale as that can be completely dependent on the business requirement. If there is a need to be up-scaled or down-scaled( short form for creating or deleting servers on demand, programmatically) according to dynamic demand, then there has to be a control mechanism inside the application to let the cloud know that there is more demand for rooms, that’s why more number of servers need to be created now to cater to the demand surge. Then the cloud has to be informed about the details of the VPS needed to be created like instance type, EBS volume needed, etc along with other needed parameters for the cloud to be able to create the server. Once the server is created, the cloud has to inform the application server back that the VPS has been created and is ready for use. Then the application server will use the newly created server for the newly created room and thus cater to the new room creation request successfully. A similar but opposite approach has to be taken when the rooms get released after usage. In this case, we need to let the cloud know that we don’t need some specific servers and they need to be deleted as they won’t be used until a new room creation request comes. When a new room creation request comes, one can again ask the cloud to create new servers and cater to the request for creating a new room successfully. This is how one will typically manage their dev ops to dynamically create and delete VPS according to the real-time need.

WebRTC auto-scaling/load-balancing, the strategies

Now that we understand what is DevOps in brief, let us also understand the general strategies to follow to do the dev ops, especially for the video conferencing use case. It can be broadly divided into 2 scenarios based on varied levels of automation that can be brought in to satisfy one’s business requirement. Though there can be a lot of variations of automation that can be brought in, let me describe 2 strategies for the sake of simplicity that can satisfy a majority of the business requirements.

Strategy-1: Cloud agnostic Semi-automatic load balancing

In this strategy, the point is to automate the load distribution mechanism effectively to up-scale and down-scale the media servers while keeping the media servers in a cloud-agnostic manner. In this strategy, media server creation and deletion are not the scopes of load balancing. They can be independently created and updated in the load balancer in some manner so that there are enough servers always available to cater to when there is a surge in demand.

Pros:

  • Multi-cloud strategy
  • Better command and control
  • Less complex to implement

Cons:

  • Lesser automation

Strategy-2: Uni cloud Fully automatic load balancing

In this strategy, the point is to automate the load distribution mechanism effectively upscale and downscale while bringing in more automation while tightly coupling to a cloud provider.

In this, a cloud provider’s APIs can be integrated to create and destroy servers in a completely on-demand manner, without much manual intervention. In this approach, the load balancer can create servers from a specific cloud using APIs in case of an upscaling need and delete a server whenever the load decreases.

Pros:

  • Greater automation
  • Highly resource-efficient

Cons:

  • More complex to implement
  • Dependent on a single cloud vendor

There is no general rule that one should follow a specific load-balancing approach. It completely depends on the business requirement for which one needs load balancing. One should properly understand one’s business requirements and then decide the kind of load-balancing strategy that will be suitable. If you need help in deciding a good load-balancing strategy for your video infrastructure, feel free to have an instant meeting or a scheduled one with one of our core technical guys using this link.

Note: The load balancer mentioned in the above real-life incident is a WebRTC-specific stateful load balancer developed from scratch by us only for the purpose of auto-scaling WebRTC media servers. It is known as CWLB and more details about it can be found here.

The dilemma of build vs. buy from CPAAS in the world of Video Conferencing solutions

The dilemma of build vs. buy from CPAAS in the world of Video Conferencing solutions

In today’s digital age, communication has become the lifeblood of businesses and individuals alike. As organizations strive to connect with their remote teams, engage with customers, and collaborate with partners worldwide, video communication has emerged as a powerful tool. When it comes to building a video communication app, there are two main options: building from scratch or integrating Communication Platform as a Service (CPaaS) video API providers. While the latter may seem like an attractive choice due to its convenience, there are several compelling reasons why building your own video communication app is a better long-term investment.

First and foremost, building a video communication app gives you full control over the user experience. By developing your own app, you have the freedom to customize every aspect of the platform to align with your brand identity and specific requirements. From the user interface to the features and functionalities, you can tailor the app to create a seamless and intuitive experience for your users. This level of control is crucial for building strong brand recognition and fostering user loyalty.

Secondly, building your own video communication app allows you to prioritize data privacy and security. With increasing concerns about data breaches and privacy issues, having control over the infrastructure and data handling processes becomes paramount. By building your own app, you can implement robust security measures, encryption protocols, and data storage practices to safeguard sensitive information. This not only protects your users but also builds trust and credibility in your brand, setting you apart from competitors who rely on third-party providers.

Moreover, building a video communication app provides scalability and flexibility. As your business grows and evolves, you have the freedom to add or modify features, scale up the infrastructure, and adapt the app to changing market demands. This agility is crucial for staying ahead in a rapidly evolving digital landscape. On the other hand, integrating a CPaaS video API provider might limit your ability to customize or scale the app according to your unique requirements, potentially hindering your growth potential.

Building your own video communication app also offers cost-effectiveness in the long run. While integrating CPaaS video API providers may seem like a quick and cost-efficient solution initially, the subscription fees and usage charges can add up significantly over time. By building your own app, you have the opportunity to make a one-time investment in development and infrastructure setup, reducing ongoing expenses in the form of API usage fees. This allows you to have better control over your budget and allocate resources more efficiently.

Last but not least, building a video communication app provides a competitive edge. In a market saturated with generic communication tools, having a unique and tailored app sets you apart from competitors. It allows you to differentiate your brand and offer a distinctive user experience that aligns with your specific value proposition. By investing in building your own app, you position yourself as an innovative and forward-thinking organization, attracting users and potential partners who value a premium communication experience.

Let’s take the example of an virtual events company to understand the numbers between build vs buy strategy.

An virtual events company hosts 100 events a month with the average of 500 participants attending each of those events. Lets assume that each event is 6-8 hours long out of which 4 hours of audio/video is being used by all the participants and the virtual events company is using a video cPAAS provider to provide audio / video sharing capabilities to it’s participants.

The maths for 1 event would look like as below.

4 *60 = 240 minutes *200 participants * $.004/video minute= $192 (considering 200 participants sharing their video and audio)

4 *60 = 240 minutes *300 participants * $.0009/video minute= $65 (considering 300 participants sharing their audio only)

Total $257(approx.) is what the cPAAS provider charges for 1 event.

If 100 such events happen, then the total cPAAS bill would be $25,700 for the month! If the virtual events company has been using the cPAAS services for last 2 years while hosting similar kind of events each month, total cPAAS bill would have been $616,800.

Now lets do some maths to find out what the amount would if the virtual event service provider would have decided to build it from day 1.

For building a really scalable video back-end which can replace their cPAAS offering, it should take 8-12 months with a cost of $150,000(approx.). The front-end and all other costs would stay the same.

The next important cost is the server and the data transfer cost.

Lets try to calculate the data transfer costs for 1 event with 500 people with 4 hours of audio / video usage.

1 participant consuming video and audio at 540Kb/s for 4 hours

540 * 60*60*4= 7.41Gb of data consumption for 4 hours

if 200 participants are using video, then total data consumption is 200 * 7.41 = 1483Gb

1 participant consuming audio at 40Kbps for 4 hours

40*60*60*4 = 0.54Gb of data consumption for 4 hours

if 300 participants are using audio, then total data consumption is 300 * 0.54 = 164Gb

Therefor total data transfer cost becomes 1648 Gb.

In case of AWS, the data transfer costs become = 1648* $.08/Gb = $132

In case of DigitalOcean, the data transfer costs become = 1648* $.01/Gb = $16.48

Considering that the virtual event provider is running all it’s services in DigitalOcean as it is cheaper, the total data transfer cost per month for all the 100 events would be = $1648. The server cost can be considered as included in this cost as we haven’t considered the free data transfer provided by DigitalOcean as a combined package with the servers.

cPAAS cost of $25,700 vs self managed infra cost of $1648, the difference becomes $24052.

According to this calculation, it seems that the virtual event company can recover the development cost of the self managed video back-end in less than 6 months and keep on saving at least $15000 a month considering people needed to manage and enhance the video back-end would cost monthly $9052.

A similar analysis can be done for any segment based on the above example. While integrating CPaaS video API providers might offer convenience in the short term, building your own video communication app presents numerous advantages in terms of user experience, data security, scalability, cost-effectiveness, and competitive differentiation. It empowers you to create a platform that truly reflects your brand and caters to the unique needs of your users. In a world where effective communication is paramount, investing in building your own video communication app is a strategic decision that sets you on a path of long-term success and growth. We are here to help you build a cutting edge video conferencing back-end / application for your unique use case. Feel free to drop us a email at hello@centedge.io or use this link to have an instant video meeting with us.

The mediasoup opensource projects, Choosing the right one for your next requirement

The mediasoup opensource projects, Choosing the right one for your next requirement

As a video conferencing application development company, we often get requests to help our clients choose the right video conferencing open-source stack as a base to develop custom video conferencing applications according to a client’s business use case. This post will discuss the 3 most popular video conferencing projects based on mediasoup media-server and how/when to use them. We have tried to use as many details about each project along with its usability for a certain type of business use case.

The mediasoup open-source projects

Mediasoup-demo

As the name suggests, the project was developed by the original author of mediasoup to demonstrate the capabilities of mediasoup to the world. It can be considered as a complete implementation that has the code examples for things like producers, consumers, force tcp, forceVP9/H264, and Simulcast/SVC to showcase what mediasoup is capable of. It uses a protoo server/client as a signaling mechanism over WebSockets designed by the author of mediasoup himself.

Link: https://github.com/versatica/mediasoup-demo

Author: IBC(Original Author of Mediasoup)

Tech stack: Mediasoup, Nodejs, Reactjs, and WebSocket

Below are some interesting facts about this project. The below statistics are taken from GitHub as of the date of publication of this blog.

  • Opensource
  • 905 stars
  • 555 forks
  • 54 watchers
  • MIT License
  • No Horizontal scaling
  • No Vertical scaling
  • One room in one media server only
  • Community support

The usefulness of the project

It can be considered a foundation for building a robust production-grade video application as it has all the components and code needed for building such an application. The developer/development team can make use of the existing code whenever needed to achieve the business use case. Though it can be used for any kind of business use case, it is advisable to spend some time understanding the code and the design process to make the best use of it.

Our take on this project

Choose this only if you have at least 1-2 months of extra time to play around with this application and build the necessary expertise before venturing out to build a production-grade application for yourself / your company. Also, it is advisable to understand the signaling framework protoo as used by this application if you wish to use that as a signaling mechanism. alternatively, you can use socket.io if your use case doesn’t need to have more than ~200 users in 1 room.

Edumeet

This project has been developed as a fork of the original project mediasoup demo. It is more mature and production-ready than the demo project. This has primarily been designed for educational purposes like running virtual classrooms / online teaching learning though it can be retrofitted to other use cases with minimal effort. This package is highly configurable and can be configured using config files provided by the authors without the need for many code changes for utilizing its provided functionalities. The backend of this application is ready to use without the need for any kind of code changes. The front end obviously needs modification to suit your business use case.

Link: https://github.com/edumeet/edumeet

Author: Multiple authors

Tech stack: Mediasoup, Nodejs, Reactjs, and Socket-io

Below are some interesting facts about this project. The below statistics are taken from GitHub as of the date of publication of this blog.

  • Opensource
  • 1.1k stars
  • 397 forks
  • 52 watchers
  • MIT License
  • No Horizontal scaling
  • Yes Vertical scaling
  • One room in one media server only
  • Community support

The usefulness of the project

It can be considered as a project with a ready backend for production usage along with a sample frontend which needs to be modified according to the business use case. The good part is that no coding experience is needed for the backend part as it can be fully configured by using the config files already provided in the project.

Our take on this project

Choose this if you have less amount of time and developers are not very skilled with mediasoup/ WebRTC knowledge. The front end of this application can be modified by an average Web developer with a little curiosity and a keen eye. It is a good open-source package for anybody who wants to build a production-ready video application without much expertise in building WebRTC backends.

Nettu-meet

This project can be considered as a ready-to-use self-hosted application for a virtual classroom / online teaching learning use case. The title of this project itself is “Opensource video conferencing application for tutors”.This project has a nice ready-to-use frontend as well as a ready-to-use backend. The UI looks polished and has all the necessary ingredients for an online education use case.

Link: https://github.com/fmeringdal/nettu-meet

Author: Fredrik Meringdal

Tech stack: Mediasoup, Nodejs, Reactjs, and Socket-io

Below are some interesting facts about this project. The below statistics are taken from GitHub as of the date of publication of this blog.

  • Opensource
  • 1.9k stars
  • 196 forks
  • 43 watchers
  • AGPL3 License
  • No Horizontal scaling
  • No Vertical scaling
  • One room in one media server only
  • Community support

The usefulness of the project

If your use case is an online education, feel free to use this project as it is. It has all the necessary things already built in the front-end like a whiteboard, file sharing, chat, etc. It may not be very suitable for any other use case as the front end need to be redesigned and redeveloped for anything other than online education. Telehealth can still re-use this front-end but with some modifications.

Our take on this project

Choose this if your use case is a self-hosted online education solution and you don’t need anything extra that has not been provided already. You will be ready with your own online education solution within a week’s time using this open-source package.

Samvyo (Commercial)

This project has been developed from scratch using mediasoup as a media server. It has been created with a dynamic hybrid load balancing approach to provide versatility to the usability while keeping the servers usage and data transfer cost to the possible lowest. the load balancer is versatile enough to create media servers on demand spike on its own and shut them down when the demands lessen. It comes with a nice pre-built UI with all the latest features including, stage mode for virtual events, break-out rooms for focussed discussions, virtual backgrounds, public/private chat, moderator controls etc. A load-balanced server-side recording option is also available to record meetings effortlessly.

Link: https://www.samvyo.com

Author: Centedge Technologies

Tech stack: Mediasoup, Redis, Nodejs, Reactjs, and WebSocket

Below are some interesting facts about this project. The below statistics are taken from GitHub as of the date of publication of this blog.

  • Commercial
  • Yes Hybrid scaling
  • Yes Horizontal scaling
  • Yes Vertical scaling
  • One room can be in multiple media servers
  • On-demand Paid Support

The usefulness of the project

This is useful for all kinds of use cases where load balancing is necessary to cater to a large concurrent user base. This service can be used by 10k / 100k users concurrently without much issue as the load balancer does all the heavy lifting of server creation/destruction and efficient resource allocation. The UI is ready for the majority of the use cases of video conferencing / Interactive live streaming.

Our take on this project

This is our own in-house product developed from scratch with 8+ years of working experience with WebRTC and the ecosystem. If you think your use case needs scaling and load balancing to cater to a large user base then this may be a good fit for you Or if you need consistent support from the team which has originally developed it, for further enhancements, then this is for you.

Feel free to set up a free 30mins discussion with us using this link to discuss your business use case and find a suitable open-source package either from this list or outside of this list. We can help you do the requirement analysis, find a suitable open-source repository that is close to your requirement and create a list of action points that can help you build a production-ready video application, all within a budget of < $1000. Drop us a mail with your requirements at hello@centedge.io to begin.

WebRTC Media servers, Why, When, and How to choose one for your next application

WebRTC Media servers, Why, When, and How to choose one for your next application

A media server in a WebRTC infrastructure plays a critical role in scaling a WebRTC call beyond 4 participants. Whenever you join a call that has 8-10 participants or more, know that a media server is doing the hard work behind the scene to provide you with a smooth audio/video experience. If you have a need for building a WebRTC infrastructure and you need to select a WebRTC media server for your use case, then this post is going to help you with enough information to take an informed decision.

Why and When a WebRTC Media Server is required?

A WebRTC Media Server is a critical piece of software that helps a WebRTC application distribute audio/video streams to all the participants of an audio/video meeting. Without them, creating a large audio/video call beyond 4 users would be a highly difficult task due to the nature of WebRTC calls. WebRTC calls are designed for real-time use cases (<1 second of delay between the sender and receiver of an audio/video stream). In this case, a user sending his/ her audio/video streams has to send the streams to all the participants who are joining the conference for viewing it in real-time, so that a real conversation can happen. Imagine a call with 10 people, where everybody is sending his / her audio/video stream to rest 9 people(other than himself/herself) so that they can view it in real time. Let’s do some maths to find out some interesting details.

When a user joins an audio-video call that is running on WebRTC, he/she can share either audio/video/screen or all of them together.

If joined only with audio: ~40Kbps of upload bandwidth is consumed

if joined with only video: ~ 500Kbps of upload bandwidth is consumed

if joined with only screen share: ~ 800 Kbps of upload bandwidth is consumed

if all 3 are shared together : ~1340Kbps or 1.3Mbps of upload bandwidth is consumed

If there are 10 people in the meeting, then 1.3 * 9 = 11.7 Mbps of upload bandwidth will be consumed every second! Remember that you need to send your audio/video/screen-share or all of them together to everybody else except yourself. Anybody who doesn’t have a consistent 11.7Mbps bandwidth, can’t join this meeting!

This also brings another challenge for the device being used by the user to join the conference. The CPU of the device has to work very hard to compress and encode the audio/video/screen share video streams to send over the network as data packets. If the CPU has to spend 5% of its capacity to compress and encode the users audio/video/screen-share streams to send it to another user who has joined the meeting, then it has to spend 9 * 5 = 45% of its efforts to compress, encode, and send the user’s audio/video/screen-share streams to rest 9 participants.

Is the CPU not wasting its efforts by trying to do the exact same thing 9 times in this case?

Can we not compress, encode, and send just the user’s audio/video/screen-share streams

once to the cloud and the cloud does some magic to replicate the audio/video/screen-share streams of that user and send it to everybody else present in the same meeting room!

Yes we possibly can do this magic and the name of this magic is Media Server!

Different kinds of WebRTC Media Servers, MCU vs. SFU

Primarily there are 2 kinds of Media servers. One is a SFU and another is a MCU.

According to the last example, now we know that we need a media server that can replicate and distribute the streams of a user to as many people as needed without wasting the user’s network and CPU capacity. Let’s take this example forward.

There is a situation, where the meeting needs to support various UI layouts with a good amount of configuration options regarding who can view and listen to whom! It turns out that this is going to be a virtual event with various UI layouts like Stage, backstage, front-row seats, etc. Here the job of the media server is to replicate and distribute the streams to everybody else except the user himself/herself. Therefore in this case of a 10-user virtual event, every user will be sending only his / her streams to the media server once and receiving the streams from everybody else as individual streams. This way, the event organizer can create multiple UI layouts for viewing by different users according to the place they currently are in, i.e. the backstage/ stage / front row. In this situation, the SFU is helping us by sending all the streams as individual audio/video streams without forcing the way they should be displayed to an individual user. In an SFU, though the user sends only his/her audio/video/screen-share streams it receives from everybody else as individual streams which consumes download bandwidth based on the number of participants. the more the number of participants, the more the download bandwidth is consumed!

Now let’s take a different situation of a team meeting of 10 users of an organization who don’t need much dynamism in the UI but are happy with the usual Grid layout of videos. In this situation, we can merge the audio and video streams of all other participants except himself/herself in the server and create one audio/video stream which can then be sent to all other participants. Here, all the users will send their own audio/video stream and receive all others’ combined audio/video stream(Only one stream!) in a fixed layout as created by the server. The UI will just show one video which was sent by the server as the combined video element. Here MCU is helping us do our job neatly. In this situation, the download bandwidth consumption will be consistent irrespective of the number of users joining the meeting as every user will receive only one audio/video stream from the server. The 2 major downside of this approach is the number of servers needed to create a combined video of all users would be much higher than just replicating and sending the approach of an SFU and rigid UI layout which is already decided by the server without the UI having any control over it.

Two of the largest global video conferencing services use one of the approaches described above.

Gmeet : SFU

MS Teams: MCU

SFUs are slowly gaining more popularity due to the amount of flexibility they provide in creating UI layouts which is highly important for an engaging user experience and takes much lesser servers to cater to a large number of users as compared to an MCU. We are going to discuss the most popular SFUs available out there today and how to choose one for your next WebRTC Media Server requirement.

How to Choose a WebRTC Media Server for your next requirement?

In this section, we are going to discuss the top open-source media servers currently available out there and how they perform against each other. Here, I am going to discuss those media servers which use WebRTC/ openRTC as their core implementation. I won’t be covering the media servers built on PION, the go implementation of WebRTC as that needs a different post.

We would be discussing some of the key things about the below media servers.

  1. Jitsi Video Bridge(JVB), Jitsi (SFU)
  2. Kurento (SFU + MCU)
  3. Janus (SFU)
  4. Medooze (SFU + MCU)
  5. Mediasoup(SFU)

We would primarily be discussing the performance of each media server along with its suitability for building a WebRTC infrastructure.

Jitsi Video Bridge(JVB), Jitsi

Jitsi is a very popular open-source video conferencing solution available out there today. It is so popular because it provides a complete package for building a video conferencing solution including a web & mobile UI, the media server component which is JVB along with some required add-ons like recording and horizontal scalability out of the box. It has very good documentation as well which makes it easy to configure it on a cloud like AWS.

Kurento

Kurento used to be the de facto standard for building WebRTC apps for the promises it made to the WebRTC developers with its versatility(SFU + MCU) and OpenCV integration for real-time video processing way back in 2014. But after the acquisition of Kurento and its team by Twillio in 2017, the development has stopped and now it’s in maintenance mode. One can understand that it is not so great now from the fact that the current team which is maintaining Kurento has a freemium offering named OpenVidu which uses mediasoup as its core media server!

Janus

Janus is one of the most performant SFUs available out there with very good documentation. It has a very good architecture where the Janus core does the job of routing and allows various modules to do various jobs including recording, bridging to SIP/PSTN, etc. It is being updated regularly by its backer to keep it up-to-date with the latest WebRTC changes. This can be a choice for building a large-scale Enterprise RTC application which needs a good amount of time and resource investment for building the solution. The reason is that it has its own way of architecting the application and can’t be integrated as a module into a large application like mediasoup.

Medooze

Medooze is more known for its MCU capabilities than SFU capabilities though its SFU is also a capable one. Though it is a performant media server, it lacks in the documentation side which is key for open source adoption. It was acquired by Cosmo Software in 2020 after which Cosmo Software has been acquired by Dolby. This can be your choice if you are a pro in WebRTC and know most of the stuff by yourself. From Github commits it seems that it is still in active development but it still needs good effort in the documentation side.

Mediasoup

Mediasoup is a highly performant SFU media server available today with detailed documentation and it is backed by a team of dedicated authors with a vibrant open source community and backers. the best part is that it can be integrated into a large Nodejs / Rust application as a module to let it do its job as part of a large application. It has a super low-level API structure which enables developers to use whatever/however they need to use it inside their application. Though it needs a good amount of understanding to build a production-ready application that is beyond the demo provided by the original authors, it is not that difficult to work with it if one is passionate and dedicated to learning the details.

Below is a set of exhaustive performance benchmarking tests done by Cosmo Software people back in 2020 at the height of COVID when WebRTC usage was going beyond the roof to keep the world running remotely. Below are the important points from the test report that are needed to be considered. The whole test report can be found at the bottom of this post for people interested to know more.

Testing a WebRTC application needs to be done with virtual users which actually are cloud VMs joining a meeting room as a test user performing a certain task/tasks. In this case, the test users aka cloud VMs joined using the below-mentioned configuration. In this case, all the above servers were hosted as a single instance server using a VM as described below.

VM Configuration for SFU load testing

The next is load parameters which were used to test each of these media servers. The numbers are not the same for all these media servers as the peak load (after which a media server fails!) capacity is not the same for every one of these. Here these peak load numbers of each media server have been derived after a good amount of DRY runs.

Load settings SFU load testing

The test result of the load test.

Result of SFU load testing
  • Page loaded: true if the page can load on the client side which is running on a cloud VM.
  • Sender video check: true if the video of the sender is displayed and is not a still or blank image.
  • All video check: true if all the videos received by the six clients from the SFU passed the video check which means every virtual client video can viewed by all other virtual clients.

There are other important aspects of these media servers like RTT(Round Trip Time), Bitrates and overall video quality.

SFU RTT comparison

The RTT is an important parameter which tells that how fast a a media stream data aka RTP packet is delivered over the real time network conditions. The lower the RTT the better it is.

SFU bitrate comparison

The Bitrate is directly responsible for video quality. It simply means how many media stream data packets are being transmitted in real time. the higher the bitrate the better is the image quality but the higher the load on the network to transmit and on the client side CPU to decode. Therefore, it is always a balancing act tp trying to send as many data packets aka the bitrate as possible without congesting the network or overburdening the CPU. Here a good media server can play a good role with techniques like Simulcast / SVC to perosnalise the bitrate for each individual receiver based on their network and CPU capacity.

SFU video quality comparison

As it tells, this is the video quality being transmitted by the media server in various load patterns. The higher the quality the better it is.

I hope I was able to provide a brief description of each media server with a enough data points so that you can make a good decision in choosing the media server for your next video project. Feel free to drop me an email at sp@centedge.io if you need any help with your selection process or with video infrastructure development process. We have a ready to use cloud video infrastructure built with mediasoup media server which can take care of your scalable video infra needs and let you focus on your application and business logic. You can have an instant video call/ scheduled video call with me using this link for discussing anything related to WebRTC/media servers/ video conferencing/live streaming etc.

PS: Here is the link to the full test report if anybody is interested in reading the whole of it which has a detailed description of this load test along with many interesting findings.