Autoscaling WebRTC with mediasoup, CWLB 2.0 now ready

Autoscaling WebRTC with mediasoup, CWLB 2.0 now ready

As the first post of 2023, Wishing everybody a wonderful new year 2023.

If you are here, you most probably are facing issues related to scaling with your WebRTC application or you are just exploring with some future plans to build a production grade WebRTC app. In both the cases, you are at the right place. This post is going to be a continuation from the previous post we wrote on this topic a couple of months ago. The previous post described in details regarding when auto-scaling is necessary and when it is not. If you are not sure if your solution needs WebRTC autoscaling or not, you should read the previous post here before reading further.

In the last post we discussed about horizontal OR vertical scaling as a strategic option to scale mediasoup media servers based on the use case. In this post, we are going to discuss about another way of auto-scaling and its use case. We also are going to discuss interesting new enhancements to CWLB.

The third WebRTC scaling strategy

The third approach is a combination of vertical and horizontal scaling combined in as one. It can be called a hybrid scaling approach. Here the vertical scaling approach is used to scale one room to all available cores in a mediasoup instance in case of a need.Once this mediasoup instance is totally occupied, but still the same room needs more resources, the horizontal scaling is used to scale to different mediasoup instance located in separate host. For all the new resource allocation requests for the same room, the new server is then used according to the vertical scaling strategy until and unless the first server has free resources to spare. This hybrid approach is typically useful for very large rooms like large event rooms where the load-balancer needs to cater to 100s / 1000s of concurrent users in one room in a complete just in time resource request mode.

Lets understand the 2 important key words mentioned in the above paragraph.

Resource requests: A request is made to the media server to allocate some resources to the user so that the user can send / receive audio / video / screen-share media streams.

Just in time request: This load-balancer strategy is used when the load-balancer has no previous information about the size of the rooms so that it can pre-allocate and reserve the resources.Here the load-balancer has to work really hard to keep track of real time resource usage of each media server and allocate / free resources in real time as the user joins / leaves a room. This type of implementation is relative complex to a pre-allocation and reservation based load-balancing strategy.

The Hybrid+ WebRTC scaling strategy

The hybrid+ scaling strategy has all the things that is there in the hybrid scaling strategy. In addition, it also has some other important aspects which makes this strategy a really good choice for medium / large scale deployments.

  • An additional relay server between the client and the media server to make a media server completely stateless i.e. the media server will not contain any kind of business logic.
  • Capable of creating / destroying on demand media servers using APIs of cloud providers in a completely automated manner with least manual intervention
  • Capable of utilizing advanced techniques like media server cascading for keeping the latency to the minimum while catering to a global user base. Media servers in different geographic locations need to run simultaneously to enable media server cascading.
  • Capable of HA(High Availability) setup where stand by media servers can take up load when primary media servers fail while in use. Additional standby media servers need to run to ensure HA.

CWLB 2.0

CWLB 1.0 which was released in June 2022 had vertical scaling, horizontal scaling which used AWS EC2 instances for auto-scaling of media servers. This was good enough for small and medium use cases. But for large and very large use cases like large scale event, it had 2 disadvantages. The first is that the load-balancer used to take more media server resources than the number of media servers ideally it should be consuming and the second is that the data transfer costs each room was incurring while using AWS ec2 instances.

In CWLB 2.0 , we have now addressed these 2 points along with many other improvements.

First, the core load-balancer algorithm is now fully JIT request compatible. It means it now uses media server resources very efficiently by keeping track of each media server usage in the real time and allocate / de-allocate resources based on real time user resource consumption demands. It now has all strategies enabled i.e. vertical scaling, horizontal scaling, and a mix of both aka hybrid scaling.

Second, we have integrated another cloud provider, DigitalOcean into the load-balancer which has relatively less data transfer costs than AWS ec2. Lets take an edtech use case as an example to compare the data transfer costs between AWS ec2 and DigitalOcean for your reference so that you can understand why this is important.

Example

A maths tutoring company in India runs online maths tutoring classes for high school students. Here each maths teacher teaches high school maths to1000 students in one online session. They conduct 6 such sessions every day for 6 days a week with each session being conducted for 90mins. Lets try to calculate an approximate data transfer cost for a month. Here we will be using some assumptions to look more realistic.

Lets calculate the amount of data being transferred from the media servers in the cloud to students who have joined the class.

The teacher is speaking while either sharing his/her camera / screen for whole of the class time i.e. 90mins.

Lets assume that the audio is consuming 40Kb/second and the video / screen share is consuming 500Kb/second of internet bandwidth. So each student is consuming 540Kb/second of data.

Here is how the maths looks like.

540 * 60 * 90 = 2.78Gb is what one student consumes for whole 90mins session.

It there are 1000 students in that session, the total data consumption for that session would be 2780.9 Gb or 2.71 Tb.

If there are 6 such sessions happen each day, then the data transfer amount for each day would be 16.29 Tb.

Considering that these sessions happen for 6 days a week, the data transfer amount for the week would be, 97.76 Tb.

Considering 4 weeks in a month, the data transfer amount for the whole month would be, 391.06Tb. That’s a lot of data being transferred!

Now lets look at the cost. AWS ec2 charges $0.08/Gb for outbound data transfers from AWS ec2 o public internet. It essentially means AWS doesn’t charge for the teacher who is sending his / her audio and video streams to the media server but it charges for the students who are listening to the audio and video streams relayed by media server hosted in AWS ec2.

The maths looks like this.

391.06 * 1024 * 0.08 = $32,036

The amount of data consumed per month in Tb which is converted to Gb by multiplying 1024 along with AWS data transfer cost per Gb. This is the cost only for the data transfer and it doesn’t include the cost for running AWS ec2 instances for media servers. That cost will be be added to this cost on the actual usage basis.

Now lets look at the maths for running the same amount of maths tutoring sessions with media servers running on DigitalOcean.

There will be no change on the total amount of data transfers which is 391.06Tb.

The maths will look like this.

391.06 * 1024 * 0.01= $4004

This cost will further come down as there is free data transfer bundled with each DigitalOcean droplets. For example a 4 vCPU , 8GB CPU optimised instance comes with 5TB of free data transfer per month. With DigitalOcean, we can consider the final cost to be in the range of $3200 ~ $3500.

Due to this difference in the data transfer costs, we integrated DigitalOcean into CWLB 2.0 to provide an alternative to AWS ec2 to run media server with lesser cost. But this is purely optional and configurable from the loadbalancer settings of the admin dashboard.

Any organization admin can change their cloud vendor in the dashboard from AWS to DO or vice versa with a button click and the media servers will run the desired cloud as selected by the admin. The default cloud setting for running the media servers is now DO(DigitlOcean).It can be changed to AWS EC2 any time in the loadbalancer settings.

Some other important updates in CWLB 2.0 are as below.

Loadbalancing recording servers

Like media servers, the servers responsible for handling meeting recording can get exhausted quickly if there is a lot of demand for recordings. In order to solve this, we have now integrated recording server autoscaling to the load-balancer. Now the load-balancer can not only auto scale media servers but also recording servers in a fully automated manner.

Loadbalancing breakout rooms

Breakout rooms were already available in CWLB 1.0 but they were not very resource efficient. The customers had to use the same amount of credits to use breakout rooms as the main room. With CWLB 2.0, the breakout rooms are fully integrated in the JIT request handling mode into the load balancer so that the customers need not pay anything extra for using breakout rooms. It’s completely dynamic based on the actual usage of the breakout rooms irrespective of the main room size.

Due to current work pressure, we are not able to write an exhaustive list of all the updates that happened in CWLB 2.0 though we would love to write a exhaustive list when time permits. Until then if you have any query / suggestion related to CWLB 2.0, please feel free to drop us a mail at hello@centedge.io.

Don’t assume your WebRTC DevOps! It can kill

Don’t assume your WebRTC DevOps! It can kill

A real-life incident that happened with one of our customers.

A customer of ours having offices in the US and EU has a nice & innovative video conferencing application with some really cool features for collaborative meetings. They came to us for helping them fix some critical bugs and load balance their video backend. A piece of Interesting information we came to know is that they were running only one media server but a really huge one with 72 cores! The reason for running such a large server was that they wanted a lag-free & smooth video experience for all. In the beginning, when they had a small server, they were facing issues with video quality. Therefore, they took the biggest possible server for consistent video quality for all without even realizing that the video quality issue was due to the server. After digging deep, we made some interesting discoveries about their architecture and suggested some changes to their video infrastructure which includes downgrading the media server to an 8-core media server and having a horizontal load balancer to distribute the load effectively. After the suggested changes, their video infra bill was down by ~80%.

Here is the comparison.

Before:

A 72-core instance in AWS in the EU Frankfurt region costs $3.492/hour which becomes $2514.24 per month.

After:

An 8-core instance in AWS AWS in the EU Frankfurt region costs $ 0.348/hour which becomes $250.56

A horizontal load balancer instance also costs approximately the same, i.e. $250 /month.

So the total becomes $500/ month. A savings of ~80% per month on the cloud server bill!

When the CEO of the company got to know of the media server bill, he was skeptical about the business viability of the service because of the cloud bill that used to be paid every month. After the change, the prospect of the service seems more promising to him for business viability.

Load balancing WebRTC Media Servers, The Need

The rush for creating video conferencing apps is going to stay especially using WebRTC. As WebRTC 1.0 is already standardized by the Internet Engineering Task Force (IETF) by the date this post is being written, it is going to become mainstream in the coming times with the advent of 5G. Having said that, building a video conferencing app still is much more complicated than building a pure web app. Why? Because too many things need to be taken care of to create a production-ready video conferencing app. Those too many things can broadly be divided into 2 major parts. One is to code the app and test it on the local network(LAN ). Once it is successfully tested locally, it is time to take it to the cloud to make it available to a host of other users through the Internet. This is where dev-ops plays a critical role.

Now let’s understand why it is so important.

Let’s assume you have built the service to cater to 50 users in conferencing mode in each room. Now if you have taken a good VPS like c5-xLarge on a cloud provider like AWS, let’s assume it can support up to 10 conference rooms. What will happen if there is a need for an 11th room? In this case, you need to run another server that can handle another 10 rooms. But how will you know when the 11th room request will come? If don’t want to check manually every time a new room creation request comes, then there are 2 options. Either you tell your user who is requesting the 11th room that the server capacity is full and wait until a room becomes free OR create a logic so that a new server can be created magically whenever the new room creation request comes!! Now this situation is called auto-scaling and this is the magical effect of doing proper dev-ops on your cloud provider. The point to note here is that the way you are creating new servers as the demand grows, similarly you have to delete the servers when the demand reduces. Else the bill from your cloud vendor will go over the roof!!

Here is a brief summary of how a typical load-balancing mechanism works. I am not going to discuss the core logic of when to scale as that can be completely dependent on the business requirement. If there is a need to be up-scaled or down-scaled( short form for creating or deleting servers on demand, programmatically) according to dynamic demand, then there has to be a control mechanism inside the application to let the cloud know that there is more demand for rooms, that’s why more number of servers need to be created now to cater to the demand surge. Then the cloud has to be informed about the details of the VPS needed to be created like instance type, EBS volume needed, etc along with other needed parameters for the cloud to be able to create the server. Once the server is created, the cloud has to inform the application server back that the VPS has been created and is ready for use. Then the application server will use the newly created server for the newly created room and thus cater to the new room creation request successfully. A similar but opposite approach has to be taken when the rooms get released after usage. In this case, we need to let the cloud know that we don’t need some specific servers and they need to be deleted as they won’t be used until a new room creation request comes. When a new room creation request comes, one can again ask the cloud to create new servers and cater to the request for creating a new room successfully. This is how one will typically manage their dev ops to dynamically create and delete VPS according to the real-time need.

WebRTC auto-scaling/load-balancing, the strategies

Now that we understand what is DevOps in brief, let us also understand the general strategies to follow to do the dev ops, especially for the video conferencing use case. It can be broadly divided into 2 scenarios based on varied levels of automation that can be brought in to satisfy one’s business requirement. Though there can be a lot of variations of automation that can be brought in, let me describe 2 strategies for the sake of simplicity that can satisfy a majority of the business requirements.

Strategy-1: Cloud agnostic Semi-automatic load balancing

In this strategy, the point is to automate the load distribution mechanism effectively to up-scale and down-scale the media servers while keeping the media servers in a cloud-agnostic manner. In this strategy, media server creation and deletion are not the scopes of load balancing. They can be independently created and updated in the load balancer in some manner so that there are enough servers always available to cater to when there is a surge in demand.

Pros:

  • Multi-cloud strategy
  • Better command and control
  • Less complex to implement

Cons:

  • Lesser automation

Strategy-2: Uni cloud Fully automatic load balancing

In this strategy, the point is to automate the load distribution mechanism effectively upscale and downscale while bringing in more automation while tightly coupling to a cloud provider.

In this, a cloud provider’s APIs can be integrated to create and destroy servers in a completely on-demand manner, without much manual intervention. In this approach, the load balancer can create servers from a specific cloud using APIs in case of an upscaling need and delete a server whenever the load decreases.

Pros:

  • Greater automation
  • Highly resource-efficient

Cons:

  • More complex to implement
  • Dependent on a single cloud vendor

There is no general rule that one should follow a specific load-balancing approach. It completely depends on the business requirement for which one needs load balancing. One should properly understand one’s business requirements and then decide the kind of load-balancing strategy that will be suitable. If you need help in deciding a good load-balancing strategy for your video infrastructure, feel free to have an instant meeting or a scheduled one with one of our core technical guys using this link.

Note: The load balancer mentioned in the above real-life incident is a WebRTC-specific stateful load balancer developed from scratch by us only for the purpose of auto-scaling WebRTC media servers. It is known as CWLB and more details about it can be found here.

Meet Meetnow, a unified communication platform for Startups, and Enterprises

Meet Meetnow, a unified communication platform for Startups, and Enterprises

A Website is the basic identity of a product or a service or an individual or an enterprise or just anything else on the Internet. It provides information about the purpose for which the Website is built for and it also provides some mechanisms like contact forms, live chat etc. as a method of inquiry to know more about a specific product or service mentioned in the Website. These methods of inquiry are passive methods where the visitor is contacted back again after a certain time to start a conversation but many of the times the context for which the visitor inquired at the beginning, is lost. This is a loss for the Website owner as a important prospect who may have bought a product or service if the would have have been able to have a instant conversation with the Website owner.

Meetnow exactly solves this problem by enabling your Website visitors to have instant conversation either with you as a Website owner or with your authorized representative. Integrating Meetnow is a very easy thing which won’t take more than 30mins of your time. As soon as you signup here, you will get your unique Meetnow link available and accessible on your dashboard. Copy the link using the copy button available there and integrate it into a prominent area of your website so that your users can find it easily.

Here are some important fact about Meetnow which will be beneficial to know.

  • It costs only $.005 for an hour long one to one Meetnow call.
  • It is a pay as you service which means you only pay for what you use without any monthly / yearly commitments (except the Enterprise plan).
  • We are providing free credits worth $10 for trying out our platform. $10 is enough to have 2000 hours of video calling using Meetnow!
  • As an organization dedicated to achieve customer delight, we are always there to listen to your inputs to find out ways to enhance your experience.

Why Meetnow is unique?

Is Meetnow all about one to one video discussions? No, Meetnow is much more than one to one video discussions. Some of the things which are unique to Meetnow are as described below.

Presence:

A unique Meetnow page is designed for each organization where all users of the organization can manage their availability. Here a user can make himself / herself available to take Meetnow calls without sharing any kind of contact details with the visitor / guest. A carefully designed workflow makes the Meetnow call happen between the user and the visitor while saving the details of the Visitor like the email id, agenda for the discussion etc. This also has a mini CRM to view all the visitor requests with all relevant details that have happened till date.

Rooms:

Rooms are the virtual meeting / discussion rooms that enables a Meetnow call. Rooms come in different sizes, i.e. One to one rooms are dedicated for focused one to one discussions, conferencing rooms to host a video conference meeting involving a group of users and event rooms for hosting large scale virtual events which can have hundreds of participants participating in the event.

Analytics:

The core value that Meetnow provides over others is a detailed discussion analytics of each discussion irrespective of the format, i.e. one to one discussions, conferencing discussions and virtual events. This is very helpful in understanding details like who attended the discussion for how much time from which kind of device etc. along with other details. This also shows the total credits used by each discussion to the organization admin to add transparency.

Pay per use:

“Only pay for what you use” is our core philosophy behind Meetnow. It has been designed while keeping in mind that one can start as small as $10 with Meetnow(which we provide as free credit!) and keep growing his / her business by using Meetnow to streamline video meetings / discussions / virtual events etc. as per the need. There are no monthly / yearly commitments from the beginning as we understand that keeping one’s costs low is super important at the beginning to bring sustainability. That’s why we have fixed monthly costs based enterprise plans which provide more features which may be beneficial at a later point to the business to optimize costs further which may not be possible in the pay per use plan.

As a Website Owner/ Business Owner / Startup / Enterprise, you can use Meetnow to streamline your

  • Inbound / Outbound sales calls with integrated presence management
  • Focused one to one calls
  • Team meetings / discussions
  • Occasional virtual events / roadshows / product demos

Our goal at Meetnow is to bring purpose to every meeting/discussion/event with workflows and analytics. Have we achieved that yet? No. We have just started on this journey with the initial steps by bringing the uniqueness to video discussions by segregating discussions to various categories and creating a different workflow around each of them. With time and feedback from our esteemed customers, we aim to reach our goal in the due course of time. If you really think the goal we have set is important to you as well, do become one our customers and provide us your valuable feedback to build the needed workflows and analytics for your business.

We would love to provide a personal demo to anyone interest in knowing more about how Meetnow can really help them in streamlining their internal and external communications. As a first step, please signup here to test the platform out. In case of any doubts/concerns/feedback, please feel free to have an instant video discussion with one of us Or if none of us are available for instant discussion, schedule a call at your convenience, using this link.

The dilemma of build vs. buy from CPAAS in the world of Video Conferencing solutions

The dilemma of build vs. buy from CPAAS in the world of Video Conferencing solutions

In today’s digital age, communication has become the lifeblood of businesses and individuals alike. As organizations strive to connect with their remote teams, engage with customers, and collaborate with partners worldwide, video communication has emerged as a powerful tool. When it comes to building a video communication app, there are two main options: building from scratch or integrating Communication Platform as a Service (CPaaS) video API providers. While the latter may seem like an attractive choice due to its convenience, there are several compelling reasons why building your own video communication app is a better long-term investment.

First and foremost, building a video communication app gives you full control over the user experience. By developing your own app, you have the freedom to customize every aspect of the platform to align with your brand identity and specific requirements. From the user interface to the features and functionalities, you can tailor the app to create a seamless and intuitive experience for your users. This level of control is crucial for building strong brand recognition and fostering user loyalty.

Secondly, building your own video communication app allows you to prioritize data privacy and security. With increasing concerns about data breaches and privacy issues, having control over the infrastructure and data handling processes becomes paramount. By building your own app, you can implement robust security measures, encryption protocols, and data storage practices to safeguard sensitive information. This not only protects your users but also builds trust and credibility in your brand, setting you apart from competitors who rely on third-party providers.

Moreover, building a video communication app provides scalability and flexibility. As your business grows and evolves, you have the freedom to add or modify features, scale up the infrastructure, and adapt the app to changing market demands. This agility is crucial for staying ahead in a rapidly evolving digital landscape. On the other hand, integrating a CPaaS video API provider might limit your ability to customize or scale the app according to your unique requirements, potentially hindering your growth potential.

Building your own video communication app also offers cost-effectiveness in the long run. While integrating CPaaS video API providers may seem like a quick and cost-efficient solution initially, the subscription fees and usage charges can add up significantly over time. By building your own app, you have the opportunity to make a one-time investment in development and infrastructure setup, reducing ongoing expenses in the form of API usage fees. This allows you to have better control over your budget and allocate resources more efficiently.

Last but not least, building a video communication app provides a competitive edge. In a market saturated with generic communication tools, having a unique and tailored app sets you apart from competitors. It allows you to differentiate your brand and offer a distinctive user experience that aligns with your specific value proposition. By investing in building your own app, you position yourself as an innovative and forward-thinking organization, attracting users and potential partners who value a premium communication experience.

Let’s take the example of an virtual events company to understand the numbers between build vs buy strategy.

An virtual events company hosts 100 events a month with the average of 500 participants attending each of those events. Lets assume that each event is 6-8 hours long out of which 4 hours of audio/video is being used by all the participants and the virtual events company is using a video cPAAS provider to provide audio / video sharing capabilities to it’s participants.

The maths for 1 event would look like as below.

4 *60 = 240 minutes *200 participants * $.004/video minute= $192 (considering 200 participants sharing their video and audio)

4 *60 = 240 minutes *300 participants * $.0009/video minute= $65 (considering 300 participants sharing their audio only)

Total $257(approx.) is what the cPAAS provider charges for 1 event.

If 100 such events happen, then the total cPAAS bill would be $25,700 for the month! If the virtual events company has been using the cPAAS services for last 2 years while hosting similar kind of events each month, total cPAAS bill would have been $616,800.

Now lets do some maths to find out what the amount would if the virtual event service provider would have decided to build it from day 1.

For building a really scalable video back-end which can replace their cPAAS offering, it should take 8-12 months with a cost of $150,000(approx.). The front-end and all other costs would stay the same.

The next important cost is the server and the data transfer cost.

Lets try to calculate the data transfer costs for 1 event with 500 people with 4 hours of audio / video usage.

1 participant consuming video and audio at 540Kb/s for 4 hours

540 * 60*60*4= 7.41Gb of data consumption for 4 hours

if 200 participants are using video, then total data consumption is 200 * 7.41 = 1483Gb

1 participant consuming audio at 40Kbps for 4 hours

40*60*60*4 = 0.54Gb of data consumption for 4 hours

if 300 participants are using audio, then total data consumption is 300 * 0.54 = 164Gb

Therefor total data transfer cost becomes 1648 Gb.

In case of AWS, the data transfer costs become = 1648* $.08/Gb = $132

In case of DigitalOcean, the data transfer costs become = 1648* $.01/Gb = $16.48

Considering that the virtual event provider is running all it’s services in DigitalOcean as it is cheaper, the total data transfer cost per month for all the 100 events would be = $1648. The server cost can be considered as included in this cost as we haven’t considered the free data transfer provided by DigitalOcean as a combined package with the servers.

cPAAS cost of $25,700 vs self managed infra cost of $1648, the difference becomes $24052.

According to this calculation, it seems that the virtual event company can recover the development cost of the self managed video back-end in less than 6 months and keep on saving at least $15000 a month considering people needed to manage and enhance the video back-end would cost monthly $9052.

A similar analysis can be done for any segment based on the above example. While integrating CPaaS video API providers might offer convenience in the short term, building your own video communication app presents numerous advantages in terms of user experience, data security, scalability, cost-effectiveness, and competitive differentiation. It empowers you to create a platform that truly reflects your brand and caters to the unique needs of your users. In a world where effective communication is paramount, investing in building your own video communication app is a strategic decision that sets you on a path of long-term success and growth. We are here to help you build a cutting edge video conferencing back-end / application for your unique use case. Feel free to drop us a email at hello@centedge.io or use this link to have an instant video meeting with us.

The mediasoup opensource projects, Choosing the right one for your next requirement

The mediasoup opensource projects, Choosing the right one for your next requirement

As a video conferencing application development company, we often get requests to help our clients choose the right video conferencing open-source stack as a base to develop custom video conferencing applications according to a client’s business use case. This post will discuss the 3 most popular video conferencing projects based on mediasoup media-server and how/when to use them. We have tried to use as many details about each project along with its usability for a certain type of business use case.

The mediasoup open-source projects

Mediasoup-demo

As the name suggests, the project was developed by the original author of mediasoup to demonstrate the capabilities of mediasoup to the world. It can be considered as a complete implementation that has the code examples for things like producers, consumers, force tcp, forceVP9/H264, and Simulcast/SVC to showcase what mediasoup is capable of. It uses a protoo server/client as a signaling mechanism over WebSockets designed by the author of mediasoup himself.

Link: https://github.com/versatica/mediasoup-demo

Author: IBC(Original Author of Mediasoup)

Tech stack: Mediasoup, Nodejs, Reactjs, and WebSocket

Below are some interesting facts about this project. The below statistics are taken from GitHub as of the date of publication of this blog.

  • Opensource
  • 905 stars
  • 555 forks
  • 54 watchers
  • MIT License
  • No Horizontal scaling
  • No Vertical scaling
  • One room in one media server only
  • Community support

The usefulness of the project

It can be considered a foundation for building a robust production-grade video application as it has all the components and code needed for building such an application. The developer/development team can make use of the existing code whenever needed to achieve the business use case. Though it can be used for any kind of business use case, it is advisable to spend some time understanding the code and the design process to make the best use of it.

Our take on this project

Choose this only if you have at least 1-2 months of extra time to play around with this application and build the necessary expertise before venturing out to build a production-grade application for yourself / your company. Also, it is advisable to understand the signaling framework protoo as used by this application if you wish to use that as a signaling mechanism. alternatively, you can use socket.io if your use case doesn’t need to have more than ~200 users in 1 room.

Edumeet

This project has been developed as a fork of the original project mediasoup demo. It is more mature and production-ready than the demo project. This has primarily been designed for educational purposes like running virtual classrooms / online teaching learning though it can be retrofitted to other use cases with minimal effort. This package is highly configurable and can be configured using config files provided by the authors without the need for many code changes for utilizing its provided functionalities. The backend of this application is ready to use without the need for any kind of code changes. The front end obviously needs modification to suit your business use case.

Link: https://github.com/edumeet/edumeet

Author: Multiple authors

Tech stack: Mediasoup, Nodejs, Reactjs, and Socket-io

Below are some interesting facts about this project. The below statistics are taken from GitHub as of the date of publication of this blog.

  • Opensource
  • 1.1k stars
  • 397 forks
  • 52 watchers
  • MIT License
  • No Horizontal scaling
  • Yes Vertical scaling
  • One room in one media server only
  • Community support

The usefulness of the project

It can be considered as a project with a ready backend for production usage along with a sample frontend which needs to be modified according to the business use case. The good part is that no coding experience is needed for the backend part as it can be fully configured by using the config files already provided in the project.

Our take on this project

Choose this if you have less amount of time and developers are not very skilled with mediasoup/ WebRTC knowledge. The front end of this application can be modified by an average Web developer with a little curiosity and a keen eye. It is a good open-source package for anybody who wants to build a production-ready video application without much expertise in building WebRTC backends.

Nettu-meet

This project can be considered as a ready-to-use self-hosted application for a virtual classroom / online teaching learning use case. The title of this project itself is “Opensource video conferencing application for tutors”.This project has a nice ready-to-use frontend as well as a ready-to-use backend. The UI looks polished and has all the necessary ingredients for an online education use case.

Link: https://github.com/fmeringdal/nettu-meet

Author: Fredrik Meringdal

Tech stack: Mediasoup, Nodejs, Reactjs, and Socket-io

Below are some interesting facts about this project. The below statistics are taken from GitHub as of the date of publication of this blog.

  • Opensource
  • 1.9k stars
  • 196 forks
  • 43 watchers
  • AGPL3 License
  • No Horizontal scaling
  • No Vertical scaling
  • One room in one media server only
  • Community support

The usefulness of the project

If your use case is an online education, feel free to use this project as it is. It has all the necessary things already built in the front-end like a whiteboard, file sharing, chat, etc. It may not be very suitable for any other use case as the front end need to be redesigned and redeveloped for anything other than online education. Telehealth can still re-use this front-end but with some modifications.

Our take on this project

Choose this if your use case is a self-hosted online education solution and you don’t need anything extra that has not been provided already. You will be ready with your own online education solution within a week’s time using this open-source package.

Samvyo (Commercial)

This project has been developed from scratch using mediasoup as a media server. It has been created with a dynamic hybrid load balancing approach to provide versatility to the usability while keeping the servers usage and data transfer cost to the possible lowest. the load balancer is versatile enough to create media servers on demand spike on its own and shut them down when the demands lessen. It comes with a nice pre-built UI with all the latest features including, stage mode for virtual events, break-out rooms for focussed discussions, virtual backgrounds, public/private chat, moderator controls etc. A load-balanced server-side recording option is also available to record meetings effortlessly.

Link: https://www.samvyo.com

Author: Centedge Technologies

Tech stack: Mediasoup, Redis, Nodejs, Reactjs, and WebSocket

Below are some interesting facts about this project. The below statistics are taken from GitHub as of the date of publication of this blog.

  • Commercial
  • Yes Hybrid scaling
  • Yes Horizontal scaling
  • Yes Vertical scaling
  • One room can be in multiple media servers
  • On-demand Paid Support

The usefulness of the project

This is useful for all kinds of use cases where load balancing is necessary to cater to a large concurrent user base. This service can be used by 10k / 100k users concurrently without much issue as the load balancer does all the heavy lifting of server creation/destruction and efficient resource allocation. The UI is ready for the majority of the use cases of video conferencing / Interactive live streaming.

Our take on this project

This is our own in-house product developed from scratch with 8+ years of working experience with WebRTC and the ecosystem. If you think your use case needs scaling and load balancing to cater to a large user base then this may be a good fit for you Or if you need consistent support from the team which has originally developed it, for further enhancements, then this is for you.

Feel free to set up a free 30mins discussion with us using this link to discuss your business use case and find a suitable open-source package either from this list or outside of this list. We can help you do the requirement analysis, find a suitable open-source repository that is close to your requirement and create a list of action points that can help you build a production-ready video application, all within a budget of < $1000. Drop us a mail with your requirements at hello@centedge.io to begin.

WebRTC Media servers, Why, When, and How to choose one for your next application

WebRTC Media servers, Why, When, and How to choose one for your next application

A media server in a WebRTC infrastructure plays a critical role in scaling a WebRTC call beyond 4 participants. Whenever you join a call that has 8-10 participants or more, know that a media server is doing the hard work behind the scene to provide you with a smooth audio/video experience. If you have a need for building a WebRTC infrastructure and you need to select a WebRTC media server for your use case, then this post is going to help you with enough information to take an informed decision.

Why and When a WebRTC Media Server is required?

A WebRTC Media Server is a critical piece of software that helps a WebRTC application distribute audio/video streams to all the participants of an audio/video meeting. Without them, creating a large audio/video call beyond 4 users would be a highly difficult task due to the nature of WebRTC calls. WebRTC calls are designed for real-time use cases (<1 second of delay between the sender and receiver of an audio/video stream). In this case, a user sending his/ her audio/video streams has to send the streams to all the participants who are joining the conference for viewing it in real-time, so that a real conversation can happen. Imagine a call with 10 people, where everybody is sending his / her audio/video stream to rest 9 people(other than himself/herself) so that they can view it in real time. Let’s do some maths to find out some interesting details.

When a user joins an audio-video call that is running on WebRTC, he/she can share either audio/video/screen or all of them together.

If joined only with audio: ~40Kbps of upload bandwidth is consumed

if joined with only video: ~ 500Kbps of upload bandwidth is consumed

if joined with only screen share: ~ 800 Kbps of upload bandwidth is consumed

if all 3 are shared together : ~1340Kbps or 1.3Mbps of upload bandwidth is consumed

If there are 10 people in the meeting, then 1.3 * 9 = 11.7 Mbps of upload bandwidth will be consumed every second! Remember that you need to send your audio/video/screen-share or all of them together to everybody else except yourself. Anybody who doesn’t have a consistent 11.7Mbps bandwidth, can’t join this meeting!

This also brings another challenge for the device being used by the user to join the conference. The CPU of the device has to work very hard to compress and encode the audio/video/screen share video streams to send over the network as data packets. If the CPU has to spend 5% of its capacity to compress and encode the users audio/video/screen-share streams to send it to another user who has joined the meeting, then it has to spend 9 * 5 = 45% of its efforts to compress, encode, and send the user’s audio/video/screen-share streams to rest 9 participants.

Is the CPU not wasting its efforts by trying to do the exact same thing 9 times in this case?

Can we not compress, encode, and send just the user’s audio/video/screen-share streams

once to the cloud and the cloud does some magic to replicate the audio/video/screen-share streams of that user and send it to everybody else present in the same meeting room!

Yes we possibly can do this magic and the name of this magic is Media Server!

Different kinds of WebRTC Media Servers, MCU vs. SFU

Primarily there are 2 kinds of Media servers. One is a SFU and another is a MCU.

According to the last example, now we know that we need a media server that can replicate and distribute the streams of a user to as many people as needed without wasting the user’s network and CPU capacity. Let’s take this example forward.

There is a situation, where the meeting needs to support various UI layouts with a good amount of configuration options regarding who can view and listen to whom! It turns out that this is going to be a virtual event with various UI layouts like Stage, backstage, front-row seats, etc. Here the job of the media server is to replicate and distribute the streams to everybody else except the user himself/herself. Therefore in this case of a 10-user virtual event, every user will be sending only his / her streams to the media server once and receiving the streams from everybody else as individual streams. This way, the event organizer can create multiple UI layouts for viewing by different users according to the place they currently are in, i.e. the backstage/ stage / front row. In this situation, the SFU is helping us by sending all the streams as individual audio/video streams without forcing the way they should be displayed to an individual user. In an SFU, though the user sends only his/her audio/video/screen-share streams it receives from everybody else as individual streams which consumes download bandwidth based on the number of participants. the more the number of participants, the more the download bandwidth is consumed!

Now let’s take a different situation of a team meeting of 10 users of an organization who don’t need much dynamism in the UI but are happy with the usual Grid layout of videos. In this situation, we can merge the audio and video streams of all other participants except himself/herself in the server and create one audio/video stream which can then be sent to all other participants. Here, all the users will send their own audio/video stream and receive all others’ combined audio/video stream(Only one stream!) in a fixed layout as created by the server. The UI will just show one video which was sent by the server as the combined video element. Here MCU is helping us do our job neatly. In this situation, the download bandwidth consumption will be consistent irrespective of the number of users joining the meeting as every user will receive only one audio/video stream from the server. The 2 major downside of this approach is the number of servers needed to create a combined video of all users would be much higher than just replicating and sending the approach of an SFU and rigid UI layout which is already decided by the server without the UI having any control over it.

Two of the largest global video conferencing services use one of the approaches described above.

Gmeet : SFU

MS Teams: MCU

SFUs are slowly gaining more popularity due to the amount of flexibility they provide in creating UI layouts which is highly important for an engaging user experience and takes much lesser servers to cater to a large number of users as compared to an MCU. We are going to discuss the most popular SFUs available out there today and how to choose one for your next WebRTC Media Server requirement.

How to Choose a WebRTC Media Server for your next requirement?

In this section, we are going to discuss the top open-source media servers currently available out there and how they perform against each other. Here, I am going to discuss those media servers which use WebRTC/ openRTC as their core implementation. I won’t be covering the media servers built on PION, the go implementation of WebRTC as that needs a different post.

We would be discussing some of the key things about the below media servers.

  1. Jitsi Video Bridge(JVB), Jitsi (SFU)
  2. Kurento (SFU + MCU)
  3. Janus (SFU)
  4. Medooze (SFU + MCU)
  5. Mediasoup(SFU)

We would primarily be discussing the performance of each media server along with its suitability for building a WebRTC infrastructure.

Jitsi Video Bridge(JVB), Jitsi

Jitsi is a very popular open-source video conferencing solution available out there today. It is so popular because it provides a complete package for building a video conferencing solution including a web & mobile UI, the media server component which is JVB along with some required add-ons like recording and horizontal scalability out of the box. It has very good documentation as well which makes it easy to configure it on a cloud like AWS.

Kurento

Kurento used to be the de facto standard for building WebRTC apps for the promises it made to the WebRTC developers with its versatility(SFU + MCU) and OpenCV integration for real-time video processing way back in 2014. But after the acquisition of Kurento and its team by Twillio in 2017, the development has stopped and now it’s in maintenance mode. One can understand that it is not so great now from the fact that the current team which is maintaining Kurento has a freemium offering named OpenVidu which uses mediasoup as its core media server!

Janus

Janus is one of the most performant SFUs available out there with very good documentation. It has a very good architecture where the Janus core does the job of routing and allows various modules to do various jobs including recording, bridging to SIP/PSTN, etc. It is being updated regularly by its backer to keep it up-to-date with the latest WebRTC changes. This can be a choice for building a large-scale Enterprise RTC application which needs a good amount of time and resource investment for building the solution. The reason is that it has its own way of architecting the application and can’t be integrated as a module into a large application like mediasoup.

Medooze

Medooze is more known for its MCU capabilities than SFU capabilities though its SFU is also a capable one. Though it is a performant media server, it lacks in the documentation side which is key for open source adoption. It was acquired by Cosmo Software in 2020 after which Cosmo Software has been acquired by Dolby. This can be your choice if you are a pro in WebRTC and know most of the stuff by yourself. From Github commits it seems that it is still in active development but it still needs good effort in the documentation side.

Mediasoup

Mediasoup is a highly performant SFU media server available today with detailed documentation and it is backed by a team of dedicated authors with a vibrant open source community and backers. the best part is that it can be integrated into a large Nodejs / Rust application as a module to let it do its job as part of a large application. It has a super low-level API structure which enables developers to use whatever/however they need to use it inside their application. Though it needs a good amount of understanding to build a production-ready application that is beyond the demo provided by the original authors, it is not that difficult to work with it if one is passionate and dedicated to learning the details.

Below is a set of exhaustive performance benchmarking tests done by Cosmo Software people back in 2020 at the height of COVID when WebRTC usage was going beyond the roof to keep the world running remotely. Below are the important points from the test report that are needed to be considered. The whole test report can be found at the bottom of this post for people interested to know more.

Testing a WebRTC application needs to be done with virtual users which actually are cloud VMs joining a meeting room as a test user performing a certain task/tasks. In this case, the test users aka cloud VMs joined using the below-mentioned configuration. In this case, all the above servers were hosted as a single instance server using a VM as described below.

VM Configuration for SFU load testing

The next is load parameters which were used to test each of these media servers. The numbers are not the same for all these media servers as the peak load (after which a media server fails!) capacity is not the same for every one of these. Here these peak load numbers of each media server have been derived after a good amount of DRY runs.

Load settings SFU load testing

The test result of the load test.

Result of SFU load testing
  • Page loaded: true if the page can load on the client side which is running on a cloud VM.
  • Sender video check: true if the video of the sender is displayed and is not a still or blank image.
  • All video check: true if all the videos received by the six clients from the SFU passed the video check which means every virtual client video can viewed by all other virtual clients.

There are other important aspects of these media servers like RTT(Round Trip Time), Bitrates and overall video quality.

SFU RTT comparison

The RTT is an important parameter which tells that how fast a a media stream data aka RTP packet is delivered over the real time network conditions. The lower the RTT the better it is.

SFU bitrate comparison

The Bitrate is directly responsible for video quality. It simply means how many media stream data packets are being transmitted in real time. the higher the bitrate the better is the image quality but the higher the load on the network to transmit and on the client side CPU to decode. Therefore, it is always a balancing act tp trying to send as many data packets aka the bitrate as possible without congesting the network or overburdening the CPU. Here a good media server can play a good role with techniques like Simulcast / SVC to perosnalise the bitrate for each individual receiver based on their network and CPU capacity.

SFU video quality comparison

As it tells, this is the video quality being transmitted by the media server in various load patterns. The higher the quality the better it is.

I hope I was able to provide a brief description of each media server with a enough data points so that you can make a good decision in choosing the media server for your next video project. Feel free to drop me an email at sp@centedge.io if you need any help with your selection process or with video infrastructure development process. We have a ready to use cloud video infrastructure built with mediasoup media server which can take care of your scalable video infra needs and let you focus on your application and business logic. You can have an instant video call/ scheduled video call with me using this link for discussing anything related to WebRTC/media servers/ video conferencing/live streaming etc.

PS: Here is the link to the full test report if anybody is interested in reading the whole of it which has a detailed description of this load test along with many interesting findings.