WebRTC Media servers, Why, When, and How to choose one for your next application

WebRTC Media servers, Why, When, and How to choose one for your next application

A media server in a WebRTC infrastructure plays a critical role in scaling a WebRTC call beyond 4 participants. Whenever you join a call that has 8-10 participants or more, know that a media server is doing the hard work behind the scene to provide you with a smooth audio/video experience. If you have a need for building a WebRTC infrastructure and you need to select a WebRTC media server for your use case, then this post is going to help you with enough information to take an informed decision.

Why and When a WebRTC Media Server is required?

A WebRTC Media Server is a critical piece of software that helps a WebRTC application distribute audio/video streams to all the participants of an audio/video meeting. Without them, creating a large audio/video call beyond 4 users would be a highly difficult task due to the nature of WebRTC calls. WebRTC calls are designed for real-time use cases (<1 second of delay between the sender and receiver of an audio/video stream). In this case, a user sending his/ her audio/video streams has to send the streams to all the participants who are joining the conference for viewing it in real-time, so that a real conversation can happen. Imagine a call with 10 people, where everybody is sending his / her audio/video stream to rest 9 people(other than himself/herself) so that they can view it in real time. Let’s do some maths to find out some interesting details.

When a user joins an audio-video call that is running on WebRTC, he/she can share either audio/video/screen or all of them together.

If joined only with audio: ~40Kbps of upload bandwidth is consumed

if joined with only video: ~ 500Kbps of upload bandwidth is consumed

if joined with only screen share: ~ 800 Kbps of upload bandwidth is consumed

if all 3 are shared together : ~1340Kbps or 1.3Mbps of upload bandwidth is consumed

If there are 10 people in the meeting, then 1.3 * 9 = 11.7 Mbps of upload bandwidth will be consumed every second! Remember that you need to send your audio/video/screen-share or all of them together to everybody else except yourself. Anybody who doesn’t have a consistent 11.7Mbps bandwidth, can’t join this meeting!

This also brings another challenge for the device being used by the user to join the conference. The CPU of the device has to work very hard to compress and encode the audio/video/screen share video streams to send over the network as data packets. If the CPU has to spend 5% of its capacity to compress and encode the users audio/video/screen-share streams to send it to another user who has joined the meeting, then it has to spend 9 * 5 = 45% of its efforts to compress, encode, and send the user’s audio/video/screen-share streams to rest 9 participants.

Is the CPU not wasting its efforts by trying to do the exact same thing 9 times in this case?

Can we not compress, encode, and send just the user’s audio/video/screen-share streams

once to the cloud and the cloud does some magic to replicate the audio/video/screen-share streams of that user and send it to everybody else present in the same meeting room!

Yes we possibly can do this magic and the name of this magic is Media Server!

Different kinds of WebRTC Media Servers, MCU vs. SFU

Primarily there are 2 kinds of Media servers. One is a SFU and another is a MCU.

According to the last example, now we know that we need a media server that can replicate and distribute the streams of a user to as many people as needed without wasting the user’s network and CPU capacity. Let’s take this example forward.

There is a situation, where the meeting needs to support various UI layouts with a good amount of configuration options regarding who can view and listen to whom! It turns out that this is going to be a virtual event with various UI layouts like Stage, backstage, front-row seats, etc. Here the job of the media server is to replicate and distribute the streams to everybody else except the user himself/herself. Therefore in this case of a 10-user virtual event, every user will be sending only his / her streams to the media server once and receiving the streams from everybody else as individual streams. This way, the event organizer can create multiple UI layouts for viewing by different users according to the place they currently are in, i.e. the backstage/ stage / front row. In this situation, the SFU is helping us by sending all the streams as individual audio/video streams without forcing the way they should be displayed to an individual user. In an SFU, though the user sends only his/her audio/video/screen-share streams it receives from everybody else as individual streams which consumes download bandwidth based on the number of participants. the more the number of participants, the more the download bandwidth is consumed!

Now let’s take a different situation of a team meeting of 10 users of an organization who don’t need much dynamism in the UI but are happy with the usual Grid layout of videos. In this situation, we can merge the audio and video streams of all other participants except himself/herself in the server and create one audio/video stream which can then be sent to all other participants. Here, all the users will send their own audio/video stream and receive all others’ combined audio/video stream(Only one stream!) in a fixed layout as created by the server. The UI will just show one video which was sent by the server as the combined video element. Here MCU is helping us do our job neatly. In this situation, the download bandwidth consumption will be consistent irrespective of the number of users joining the meeting as every user will receive only one audio/video stream from the server. The 2 major downside of this approach is the number of servers needed to create a combined video of all users would be much higher than just replicating and sending the approach of an SFU and rigid UI layout which is already decided by the server without the UI having any control over it.

Two of the largest global video conferencing services use one of the approaches described above.

Gmeet : SFU

MS Teams: MCU

SFUs are slowly gaining more popularity due to the amount of flexibility they provide in creating UI layouts which is highly important for an engaging user experience and takes much lesser servers to cater to a large number of users as compared to an MCU. We are going to discuss the most popular SFUs available out there today and how to choose one for your next WebRTC Media Server requirement.

How to Choose a WebRTC Media Server for your next requirement?

In this section, we are going to discuss the top open-source media servers currently available out there and how they perform against each other. Here, I am going to discuss those media servers which use WebRTC/ openRTC as their core implementation. I won’t be covering the media servers built on PION, the go implementation of WebRTC as that needs a different post.

We would be discussing some of the key things about the below media servers.

  1. Jitsi Video Bridge(JVB), Jitsi (SFU)
  2. Kurento (SFU + MCU)
  3. Janus (SFU)
  4. Medooze (SFU + MCU)
  5. Mediasoup(SFU)

We would primarily be discussing the performance of each media server along with its suitability for building a WebRTC infrastructure.

Jitsi Video Bridge(JVB), Jitsi

Jitsi is a very popular open-source video conferencing solution available out there today. It is so popular because it provides a complete package for building a video conferencing solution including a web & mobile UI, the media server component which is JVB along with some required add-ons like recording and horizontal scalability out of the box. It has very good documentation as well which makes it easy to configure it on a cloud like AWS.

Kurento

Kurento used to be the de facto standard for building WebRTC apps for the promises it made to the WebRTC developers with its versatility(SFU + MCU) and OpenCV integration for real-time video processing way back in 2014. But after the acquisition of Kurento and its team by Twillio in 2017, the development has stopped and now it’s in maintenance mode. One can understand that it is not so great now from the fact that the current team which is maintaining Kurento has a freemium offering named OpenVidu which uses mediasoup as its core media server!

Janus

Janus is one of the most performant SFUs available out there with very good documentation. It has a very good architecture where the Janus core does the job of routing and allows various modules to do various jobs including recording, bridging to SIP/PSTN, etc. It is being updated regularly by its backer to keep it up-to-date with the latest WebRTC changes. This can be a choice for building a large-scale Enterprise RTC application which needs a good amount of time and resource investment for building the solution. The reason is that it has its own way of architecting the application and can’t be integrated as a module into a large application like mediasoup.

Medooze

Medooze is more known for its MCU capabilities than SFU capabilities though its SFU is also a capable one. Though it is a performant media server, it lacks in the documentation side which is key for open source adoption. It was acquired by Cosmo Software in 2020 after which Cosmo Software has been acquired by Dolby. This can be your choice if you are a pro in WebRTC and know most of the stuff by yourself. From Github commits it seems that it is still in active development but it still needs good effort in the documentation side.

Mediasoup

Mediasoup is a highly performant SFU media server available today with detailed documentation and it is backed by a team of dedicated authors with a vibrant open source community and backers. the best part is that it can be integrated into a large Nodejs / Rust application as a module to let it do its job as part of a large application. It has a super low-level API structure which enables developers to use whatever/however they need to use it inside their application. Though it needs a good amount of understanding to build a production-ready application that is beyond the demo provided by the original authors, it is not that difficult to work with it if one is passionate and dedicated to learning the details.

Below is a set of exhaustive performance benchmarking tests done by Cosmo Software people back in 2020 at the height of COVID when WebRTC usage was going beyond the roof to keep the world running remotely. Below are the important points from the test report that are needed to be considered. The whole test report can be found at the bottom of this post for people interested to know more.

Testing a WebRTC application needs to be done with virtual users which actually are cloud VMs joining a meeting room as a test user performing a certain task/tasks. In this case, the test users aka cloud VMs joined using the below-mentioned configuration. In this case, all the above servers were hosted as a single instance server using a VM as described below.

VM Configuration for SFU load testing

The next is load parameters which were used to test each of these media servers. The numbers are not the same for all these media servers as the peak load (after which a media server fails!) capacity is not the same for every one of these. Here these peak load numbers of each media server have been derived after a good amount of DRY runs.

Load settings SFU load testing

The test result of the load test.

Result of SFU load testing
  • Page loaded: true if the page can load on the client side which is running on a cloud VM.
  • Sender video check: true if the video of the sender is displayed and is not a still or blank image.
  • All video check: true if all the videos received by the six clients from the SFU passed the video check which means every virtual client video can viewed by all other virtual clients.

There are other important aspects of these media servers like RTT(Round Trip Time), Bitrates and overall video quality.

SFU RTT comparison

The RTT is an important parameter which tells that how fast a a media stream data aka RTP packet is delivered over the real time network conditions. The lower the RTT the better it is.

SFU bitrate comparison

The Bitrate is directly responsible for video quality. It simply means how many media stream data packets are being transmitted in real time. the higher the bitrate the better is the image quality but the higher the load on the network to transmit and on the client side CPU to decode. Therefore, it is always a balancing act tp trying to send as many data packets aka the bitrate as possible without congesting the network or overburdening the CPU. Here a good media server can play a good role with techniques like Simulcast / SVC to perosnalise the bitrate for each individual receiver based on their network and CPU capacity.

SFU video quality comparison

As it tells, this is the video quality being transmitted by the media server in various load patterns. The higher the quality the better it is.

I hope I was able to provide a brief description of each media server with a enough data points so that you can make a good decision in choosing the media server for your next video project. Feel free to drop me an email at sp@centedge.io if you need any help with your selection process or with video infrastructure development process. We have a ready to use cloud video infrastructure built with mediasoup media server which can take care of your scalable video infra needs and let you focus on your application and business logic. You can have an instant video call/ scheduled video call with me using this link for discussing anything related to WebRTC/media servers/ video conferencing/live streaming etc.

PS: Here is the link to the full test report if anybody is interested in reading the whole of it which has a detailed description of this load test along with many interesting findings.

Enterprise Mediasoup, Empowering Enterprises with the Ultimate Media Stack

Enterprise Mediasoup, Empowering Enterprises with the Ultimate Media Stack

In today’s fast-paced world, effective communication is the lifeblood of any successful enterprise. As businesses continue to expand globally, the demand for a reliable, scalable, and secure communication system becomes paramount. This is where CWLB, our cutting-edge Media Stack steps in as the ultimate solution for enterprise usage. In this article, we will explore the myriad benefits of our Media Stack and how, as a dedicated solution provider, we can help enterprises build an Enterprise-grade communication system to meet their specific needs.

1. Unmatched Performance and Scalability:

Our Media Stack boasts exceptional performance, enabling real-time audio and video streaming without latency issues. With built-in load balancing and clustering capabilities, it can effortlessly scale to accommodate growing enterprise requirements, ensuring seamless communication across geographically dispersed teams.

2. Reliable and Secure Communication:

Security is of utmost importance for enterprises, especially when handling sensitive data. Our Media Stack is equipped with state-of-the-art encryption protocols, securing all communication channels and safeguarding against potential threats, ensuring confidential information remains private and protected.

3. Customization to Suit Enterprise Needs:

One of the key strengths of our Media Stack lies in its versatility. As a solution provider, we understand that each enterprise has unique requirements. With our expertise, we can tailor the Media Stack to meet specific needs, integrating it seamlessly with existing infrastructure and applications.

4. Seamless Integration with Communication Tools:

Our Media Stack is designed to effortlessly integrate with a wide array of communication tools, including Voice over Internet Protocol (VoIP), WebRTC, instant messaging, and more. This compatibility ensures that enterprises can leverage their existing tools while enjoying enhanced communication capabilities.

5. Enhanced Collaboration and Productivity:

Effective communication fosters collaboration, thereby boosting overall productivity. Our Media Stack facilitates crystal-clear audio and high-definition video conferencing, breaking down communication barriers and allowing teams to collaborate seamlessly.

6. Real-time Analytics and Monitoring:

Monitoring communication performance is crucial for enterprises to make informed decisions. Our Media Stack provides real-time analytics, enabling businesses to assess call quality, user engagement, and system health, ensuring optimal performance at all times.

7. Reduced Costs and Enhanced ROI:

By choosing our Media Stack, enterprises can benefit from cost savings due to its efficient resource utilization and scalability. Moreover, the enhanced communication system increases efficiency, delivering a higher return on investment (ROI).

8. Reliable Support and Maintenance:

As a solution provider, we take pride in our unwavering commitment to customer satisfaction. Our team of experts is always available to provide reliable support and timely maintenance, ensuring that our Media Stack operates at peak performance throughout its lifecycle.

9. Future-proof Solution:

With technology evolving rapidly, it is essential for enterprises to invest in future-proof solutions. Our Media Stack is built on cutting-edge technology, ensuring it remains relevant and adaptable to emerging trends and industry changes.

10. Seizing the Opportunity:

By partnering with us, enterprises can harness the power of our Media Stack to build a robust, secure, and scalable communication system. Our expert team will work closely with clients to design, implement, and maintain the ideal solution, tailored to their specific needs.

11. Enterprise Mediasoup:

As Mediasoup is a highly popular open-source Media server with an active community around it, we built CWLB on top of it which makes CWLB solid as a rock and future-proof. CWLB as a Media Stack converts open-source mediasoup to Enterprise Mediasoup to provide unmatched performance with scalability, reliability, and security.

In conclusion, our Media Stack CWLB is the ultimate choice for enterprises seeking to elevate their communication system to new heights. With unmatched performance, security, scalability, and customization options, it empowers businesses to communicate seamlessly and collaborate effectively. As a solution provider, we are committed to helping enterprises harness this power to build an Enterprise-grade communication system that drives success and growth. By embracing CWLB, our Media Stack, businesses can forge ahead confidently into the future of communication technology.

Feel free to meet one of us for an instant meeting or a scheduled meeting using Meetnow. We are reachable at hello@centedge.io and we would be delighted to hear from you.

The curious case of WebRTC Signaling Servers and the crucial role it plays in a WebRTC application

The curious case of WebRTC Signaling Servers and the crucial role it plays in a WebRTC application

WebRTC (Web Real-Time Communication) has revolutionized the way we communicate online by enabling real-time audio and video interactions directly in web browsers. Behind the scenes, a crucial component ensures seamless communication between peers: the signaling server. If you are building a WebRTC application, the first server you will need is a Signaling Server. In simple terms, A Signaling server is a server that acts like a middleman that helps in the discovery of users who are interested in speaking to each other by exchanging messages between them. Technically it can implement the discovery and the message exchange mechanism in many different ways. It also can do much more than just discovery and message exchange which can include domain-specific business logic like a whiteboard implementation for e-learning, a prescription writing system for e-health, and a document verification system for e-banking etc. to mention a few. In short, we can conclude that the end-user experience of a video calling system is directly dependent on the robustness and scalability of the Signaling server. Therefore it is super critical to design and develop your signaling server in the right manner from early on so that it would be easy to extend its functionality when such a need arises.

WebRTC Signaling Servers:

Before diving into the signaling mechanisms, let’s understand the role of a signaling server in a WebRTC application. Unlike traditional communication systems, WebRTC does not define a signaling protocol. This means that applications need a mechanism to exchange information about session initiation, negotiation, and termination. The signaling server plays a pivotal role in facilitating this exchange of metadata between peers. In this blog post, we’ll explore the significance of a signaling server in a WebRTC application and analyze the pros and cons of different signaling mechanisms, including Socket.IO, WebSocket, and Session Initiation Protocol (SIP).

  1. Socket.IO: Socket.IO is a widely used library that enables real-time, bidirectional communication between web clients and servers. It simplifies the development of applications that require real-time updates. In the context of WebRTC, Socket.IO can serve as a signaling mechanism to exchange session details between peers.
  • Ease of Use: Socket.IO is straightforward to implement, making it an excellent choice for developers, especially those familiar with JavaScript.
  • Bi-directional Communication: It allows for real-time, bidirectional communication, which is essential for WebRTC applications.
  • Implementation Difficulty: Easiest to implement with standard javascript knowledge.
  • Overhead: Socket.IO might introduce some overhead due to its features that are not always necessary for signaling in WebRTC.
  • Limited Scalability: In large-scale applications, Socket.IO might face scalability challenges compared to other signaling mechanisms.
  • Disconnections: Due to some design limitations, it faces occasional to frequent disconnections even if the network is stable which makes it difficult to work with large-scale WebRTC applications.
  1. WebSocket: WebSocket is a communication protocol that provides full-duplex communication channels over a single, long-lived connection. It is a low-latency solution, making it suitable for real-time applications like WebRTC.
  • Low Latency: WebSocket offers low-latency communication, making it ideal for real-time applications where timely data exchange is critical.
  • Widespread Support: WebSocket is supported by most modern browsers, ensuring broad compatibility for WebRTC applications.
  • Implementation Difficulty: Moderately difficult to implement with standard javascript knowledge as some core real-time networking knowledge may be required to implement a fault-tolerant scalable solution.
  • Complexity: Implementing WebSocket can be more complex compared to Socket.IO, especially for developers less familiar with low-level networking concepts.
  • Firewall Issues: Some network configurations and firewalls might pose challenges for WebSocket connections, potentially affecting connectivity.
  1. Session Initiation Protocol (SIP): SIP is a signaling protocol widely used for initiating, maintaining, modifying, and terminating real-time sessions that involve video, voice, messaging, and other communications applications and services between two or more endpoints on IP networks.
  • Interoperability: SIP is a standardized protocol, ensuring interoperability across different platforms and devices.
  • Rich Features: SIP supports a wide range of features, making it suitable for diverse communication scenarios.
  • Implementation Difficulty: The most complex to implement among the 3 discussed here. If integration into a telephony network like PSTN is not a requirement, then it is better to avoid SIP for a normal browser-based video calling WebRTC application.
  • Complexity: SIP can be complex to implement, especially for simple WebRTC applications that do not require the full set of features it offers.
  • Overhead: Due to its extensive feature set, SIP might introduce unnecessary overhead for lightweight applications.

Conclusion:

In the realm of WebRTC, the signaling server is the unsung hero that ensures smooth communication between peers. Choosing the right signaling mechanism depends on the specific requirements and constraints of the application. Socket.IO, with its simplicity and bidirectional communication, is an excellent choice for many scenarios where the application is not very complex and does not require to scale large. WebSocket, offering low latency, ready to scale big, is suitable for applications where real-time communication is paramount with good fault tolerance along with large scale scalability and where we may not need a fallback mechanism of HTTP long polling in case WebSocket is not available which is the benefit of Socket.io. SIP, with its rich feature set, shines in complex scenarios requiring a standardized protocol for a better interoperability between web based clients and telephony infrastructure.

Understanding the pros and cons of each signaling mechanism empowers developers to make informed decisions based on the unique needs of their WebRTC applications. As the landscape of real-time communication continues to evolve, the role of signaling servers remains indispensable, shaping the future of seamless online interactions using WebRTC.

Feel free to schedule a free consultation using this link if your WebRTC Signalling server/ application is not working as it should be or if you wish to know the correct option for your upcoming use case.