admin - Centedge

WebRTC getUserMedia: The secret for using media devices in the browser

by admin | Nov 7, 2025 | WebRTC

The getUserMedia() API in WebRTC is primarily responsible capturing the media streams currently available. The WebRTC standard provides this API for accessing cameras and microphones connected to the computer or smartphone. These devices are commonly referred to as Media Devices and can be accessed with JavaScript through the navigator.mediaDevices object, which implements the MediaDevices interface. From this object we can enumerate all connected devices, listen for device changes (when a device is connected or disconnected), and open a device to retrieve a Media Stream.

The most common way this is used is through the function getUserMedia(), which returns a promise that will resolve to a MediaStream for the matching media devices. This function takes a single MediaStreamConstraints object that specifies the requirements that we have. For instance, to simply open the default microphone and camera, we would do the following.

const mediaDevices = async (constraints) => {
    return await navigator.mediaDevices.getUserMedia(constraints);
}

try {
    const stream = mediaDevices({'video':true,'audio':true});
    console.log('Got MediaStream:', stream);
} catch(error) {
    console.error('Error while trying to access media devices.', error);
}

The call to getUserMedia() will trigger a permissions request. If the user accepts the permission, the promise is resolved with a MediaStream containing one video and one audio track. If the permission is denied, a Permission Denied Error is thrown. In case there are no matching devices connected, a Not Found Error will be thrown.

Media Constraints

As one can see on the above snippet, one has to pass the media constraints while calling the getUserMedia API to access the video and audio streams( camera and mic) available to the browser. The constraints object, which must implement the MediaStreamConstraints interface, that we pass as a parameter to getUserMedia() allows us to open a media device that matches a certain requirement. This requirement can be very loosely defined (audio and/or video), or very specific (minimum camera resolution or an exact device ID). It is recommended that applications that use the getUserMedia()API first check the existing devices and then specifies a constraint that matches the exact device using the deviceId constraint. Devices will also, if possible, be configured according to the constraints. We can enable echo cancellation on microphones or set a specific or minimum width, height and frame rate of the video from the camera.Below is a brief about how to use media constraints in an advanced way.

async function findConnectedDevices(type) {
    const devices = await navigator.mediaDevices.enumerateDevices();
    return devices.filter(device => device.kind === type)
}

async function openCamera(cameraId, minWidth, minHeight) {
    const constraints = {
        'audio': {'echoCancellation': true},
        'video': {
            'deviceId': cameraId,
            'width': {'min': minWidth},
            'height': {'min': minHeight},
            'frameRate': {'min': 10},
            }
        }

    return await navigator.mediaDevices.getUserMedia(constraints);
}

const cameras = findConnectedDevices('videoinput');
if (cameras && cameras.length > 0) {
    const stream = openCamera(cameras[0].deviceId, 1280, 720);
}

The full documentation for the MediaStreamConstraints interface can be found on the MDN web docs .

Playing the video locally

Once a media device has been opened and we have a MediaStream available, we can assign it to a video or audio element to play the stream locally. The HTML needed for a typical video element used with getUserMedia() will usually have the attributes autoplay and playsinline. The autoplay attribute will cause new streams assigned to the element to play automatically. The playsinline attribute allows video to play inline, instead of only in full screen, on certain mobile browsers. It is also recommended to use controls=”false” for live streams, unless the user should be able to pause them.

async function playVideoFromCamera() {
    try {
        const constraints = {'video': true, 'audio': true};
        const stream = await navigator.mediaDevices.getUserMedia(constraints);
        const videoElement = document.querySelector('video#localVideo');
        videoElement.srcObject = stream;
    } catch(error) {
        console.error('Error opening video camera.', error);
    }
}

<html>
<head><title>Local video playback</video></head>
<body>
    <video id="localVideo" autoplay playsinline controls="false" />
</body>
</html>

This post briefly describes how to use the getUserMedia API currently available in all modern browsers. To explore this API further for understanding how it can be used with advanced settings and configurations needed for creating a production grade video conferencing applications, refer to this blog post.

If you are planning to build a simple P2P(peer to peer) video conferencing application, then check this blog post also to understand another important RTCPeerConnection API. One should be able to build a p2p video conferencing app using these 2 important APIs.

if you have any questions related getUserMedia / WebRTC as a whole, you can ask all your questions to get prompt answers in this dedicated WebRTC forum.

if you want to learn WebRTC to build a sound understanding of it along with all the technology in it’s protocol stack like ICE, STUN,TURN, DTLS, SRTP, SCTP etc., then check out our live online/onsite instructor led WebRTC training programs here. If you wish to register for one of our upcoming training programs, then you can do so using the registration form link provided there.

Here is an example public github repo for creating a simple p2p video conferencing app build using these 2 API with Nodejs as signalling server. Feel free to download the example and play around to understand the basics of WebRTC.

If you want to build some serious production grade video conferencing applications, then check out this open source github repo for a production grade WebRTC signalling server built using NodeJs which you can use to build and deploy your video calling app to any cloud. If your need is not fulfilled by this repo, feel free to visit our services page to know more about services.

If you want a custom video application without having to go through the pain of building it, then check out our products page to know more about our scalable, customizable and fully managed video conferencing / live streaming application as a service along with custom branding.

Feel free to reach out to us at hello@centedge.io for any kind of help and support need with WebRTC.

WebRTC for Absolute beginners, an overview

by admin | Nov 7, 2025 | WebRTC

With WebRTC, you can add real-time communication capabilities to your application that works on top of an open standard. It supports video, voice, and generic data to be sent between peers, allowing developers to build powerful voice- and video-communication solutions. The technology is available on all modern browsers as well as on native clients for all major platforms. The technologies behind WebRTC are implemented as an open web standard and available as regular JavaScript APIs in all major browsers. For native clients, like Android and iOS applications, a library is available that provides the same functionality. The WebRTC project is an open source project and supported by Apple, Google, Microsoft and Mozilla, amongst others.

What can WebRTC really do?

There are many different use-cases for WebRTC, from basic web apps that uses the camera or microphone, to more advanced video-calling applications and screen sharing. WebRTC can be used to build anything and everything starting from weekend side projects to build a simple one to one video chat app or build a complex enterprise grade video conferencing apps with security and other necessary features.

What exactly is WebRTC?

A WebRTC application usually goes through a common application flow. To make it simple, it can be understood in 4 steps, accessing the media devices, opening peer connections, discovering peers, and start streaming. It is a collection of a set of APIs which facilitate these above mentioned steps. Creating a new application based on the WebRTC technologies can be overwhelming if one is unfamiliar with these APIs.

WebRTC APIs

The WebRTC standard covers, on a high level, two different technologies: media capture devices and peer-to-peer connectivity.

Media capture devices includes video cameras and microphones, but also screen capturing “devices”. For cameras and microphones, we use navigator.mediaDevices.getUserMedia() to capture MediaStreams. For screen recording, we use navigator.mediaDevices.getDisplayMedia() instead.

The peer-to-peer connectivity is handled by the RTCPeerConnection interface. This is the central point for establishing and controlling the connection between two peers in WebRTC.

In the upcoming posts, the APIs will be elaborated with easy understand and follow examples. Stay tuned to learn WebRTC from scratch.

Share the camera or share the screen, do it all with these browser APIs

by admin | Nov 7, 2025 | WebRTC

For the last 5 months, the demand for video conferencing has been skyrocketed. Majority of the human population on our planet have been locked up in their respective homes and all the work is getting done through video conferencing. The most primary requirement for a video conferencing is to share the camera and microphone with occasional screen sharing with everybody else so that the individual can be seen, heard and understood properly. Majority of these video conferences now a days run directly from a browser without the need to install any external software or even browser extension. The browsers these days have got some magical powers to do all thing related to camera, microphones and screen share. In this post, we will explore the magical powers of the browser to share these things on demand and the open secret behind these magical powers.

The open secret

The much awaited open secret is this browser API named navigator.mediadevices. This api provides the functionalities which includes getUserMedia to acquire the camera and microphones on request, enumerateDevices to list out all the available devices and getDisplayMedia to capture screen or application window or browser tab etc. These are the most commonly used apis in a typical video conferencing application.

Video conferencing applications can retrieve the current list of connected devices and also listen for changes, since many cameras and microphones connect through USB and can be connected and disconnected during the lifecycle of the application. Since the state of a media device can change at any time, it is recommended that applications register for device changes by using the necessary navigator.mediadevices apis in order to properly handle changes.

Media constraints

The next thing that needs discussion is media constraints which defines how one can access the camera and microphone or the screen share while passing specific instructions to the browser.

Capture camera using getUserMedia

For example, if there are 3 cameras available

to a browser, then a specific instruction can be given to browser as a constraint to access a specific camera out of the available 3 cameras for the video call.

The specific constraints are defined in a MediaTrackConstraint object, one for audio and one for video. The attributes in this object are of type ConstraintLong, ConstraintBoolean, ConstraintDouble or ConstraintDOMString. These can either be a specific value (e.g., a number, boolean or string), a range (LongRange or DoubleRange with a minimum and maximum value) or an object with either an ideal or exact definition. For a specific value, the browser will attempt to pick something as close as possible. For a range, the best value in that range will be used. When exact is specified, only media streams that exactly match that constraints will be returned.

// Camera with a resolution as close to 640x480 as possible
{
    "video": {
        "width": 640,
        "height": 480
    }
}

// Camera with a resolution in the range 640x480 to 1024x768
{
    "video": {
        "width": {
            "min": 640,
            "max": 1024
        },
        "height": {
            "min": 480,
            "max": 768
        }
    }
}

// Camera with the exact resolution of 1024x768
{
    "video": {
        "width": {
            "exact": 1024
        },
        "height": {
            "exact": 768
        }
    }
}

To determine the actual configuration of a certain track of a media stream has, we can call MediaStreamTrack.getSettings() which returns the MediaTrackSettings currently applied.

It is also possible to update the constraints of a track from a media device we have opened, by calling applyConstraints() on the track. This lets an application re-configure a media device without first having to close the existing stream.

Capture screen using getDisplayMedia

An application that wants to be able to perform screen capturing and recording must use the Display Media API. The function getDisplayMedia() (which is part of navigator.mediaDevices is similar to getUserMedia() and is used for the purpose of opening the content of the display (or a portion of it, such as a window). The returned MediaStream works the same as when using getUserMedia().

The constraints for getDisplayMedia() differ from the ones used for regular video or audio input.

{
    video: {
        cursor: 'always' | 'motion' | 'never',
        displaySurface: 'application' | 'browser' | 'monitor' | 'window'
    }
}

The code snipet above shows how the special constraints for screen recording works. Note that these might not be supported by all browser that have display media support.

Tips and tricks

A MediaStream represents a stream of media content, which consists of tracks (MediaStreamTrack) of audio and video. You can retrieve all the tracks from MediaStream by calling MediaStream.getTracks(), which returns an array of MediaStreamTrack objects.

A MediaStreamTrack has a kind property that is either audio or video, indicating the kind of media it represents. Each track can be muted by toggling its enabled property. A track has a Boolean property remote that indicates if it is source by a RTCPeerConnection and coming from a remote peer.

WebRTC RTCPeerConnection: The secret behind connecting peers in the new video call app

by admin | Nov 7, 2025 | WebRTC

WebRTC RTCPeerConnection is the API which deals with connecting two applications on different computers to communicate using a peer-to-peer protocol. The communication between peers can be video, audio or arbitrary binary data (for clients supporting the RTCDataChannel API). In order to discover how two peers can connect, both clients need to connect to a common signalling server and also provide an ICE Server configuration. The ice server can either be a STUN or a TURN-server, and their role is to provide ICE candidates to each client which is then transferred to the remote peer. This transferring of ICE candidates is commonly called signalling. All these new terminologies may sound alien at the beginning but these are the secret behind successfully connecting a video call between 2 computers using only browsers.

Signalling is needed in order for two peers to share how they should connect. Usually this is solved through a regular HTTP-based Web API (i.e., a REST service or other RPC mechanism like web socket) where web applications can relay the necessary information before the peer connection is initiated. Signalling can be implemented in many different ways, and the WebRTC specification doesn’t prefer any specific solution.

Peer connection initiation

RTCPeerConnection API is responsible for creating the RTCPeerConnection object by instantiating it as described in the code snippet below. The constructor for this class takes a single RTCConfiguration object as its parameter. This object defines how the peer connection is set up and should contain information about the ICE servers to use.

Once the RTCPeerConnection is created we need to create an SDP offer or answer, depending on if we are the calling peer or receiving peer. Once the SDP offer or answer is created, it must be sent to the remote peer through a different channel. Passing SDP objects to remote peers is called signalling, to be specific and is not covered by the WebRTC specification.

To initiate the peer connection setup from the calling side, we create a RTCPeerConnection object and then call createOffer() to create a RTCSessionDescription object. This session description is set as the local description using setLocalDescription() and is then sent over our signalling channel to the receiving side. We also set up a listener to our signalling channel for when an answer to our offered session description is received from the receiving side.

Simple signalling server

// Set up an asynchronous communication channel that will be
// used during the peer connection setup
const signalingChannel = new SignalingChannel(remoteClientId);
signalingChannel.addEventListener('message', message => {
    // New message from remote client received
});

// Send an asynchronous message to the remote client
signalingChannel.send('Hello!');

Initiating the call from browser A

async function makeCall() {
    const configuration {'iceServers': [{'urls': 'stun:stun.l.google.com:19302'}]}
    const peerConnection = new RTCPeerConnection(configuration);
    signalingChannel.addEventListener('message', async message => {
        if (message.answer) {
            const remoteDesc = new RTCSessionDescription(message.answer);
            await peerConnection.setRemoteDescription(remoteDesc);
        }
    });
    const offer = await peerConnection.createOffer();
    await peerConnection.setLocalDescription(offer);
    signalingChannel.send({'offer': offer});
}

On the receiving side, we wait for an incoming offer before we create our RTCPeerConnection instance. Once that is done we set the received offer using setRemoteDescription(). Next, we call createAnswer() to create an answer to the received offer. This answer is set as the local description using setLocalDescription() and then sent to the calling side over our signalling server.

const peerConnection = new RTCPeerConnection(configuration);
signalingChannel.addEventListener('message', async message => {
    if (message.offer) {
        peerConnection.setRemoteDescription(new RTCSessionDescription(message.offer));
        const answer = await peerConnection.createAnswer();
        await peerConnection.setLocalDescription(answer);
        signalingChannel.send({'answer': answer});
    }
});

Once the two peers have set both the local and remote session descriptions they know the capabilities of their respective remote peer. This doesn’t mean that the connection between the peers has already been established. For this to work we need to collect the ICE candidates at each peer and transfer (over the signalling channel) to the other peer in order to establish the connection between them.

ICE Candidates

ICE means Internet Connectivity Establishment. Before two peers can communicate using WebRTC, they need to exchange connectivity information. Since the network conditions can vary depending on a number of factors, an external service is usually used for discovering the possible candidates for connecting to a peer. This service is called ICE and is using either a STUN or a TURN server. STUN stands for Session Traversal of User Datagram Protocol, and is usually used indirectly in most WebRTC applications.

TURN (Traversal Using Relay NAT) is the more advanced solution that incorporates the STUN protocols and most commercial WebRTC based services uses a TURN server for establishing connections between peers. The WebRTC API supports both STUN and TURN directly, and it is gathered under the more complete term Internet Connectivity Establishment. When creating a WebRTC connection, we usually provide one or several ICE servers in the configuration for the RTCPeerConnection object.

Trickle ICE

Trickle ICE is a technique which is used to reduce the call setup time between the 2 peers. Once a RTCPeerConnection object is created, the underlying framework uses the provided ICE servers to gather candidates for establishing connectivity based on the ICE candidates. The event icegatheringstatechange on RTCPeerConnection signals in what state the ICE gathering is (new, gathering or complete).

While it is possible for a peer to wait until the ICE gathering is complete, it is usually much more efficient to use this technique and transmit each ICE candidate to the remote peer as it gets discovered. This significantly reduces the setup time for the peer connectivity and allow a video call to get started with less delays.

To gather ICE candidates, simply add a listener for the icecandidate event. The RTCPeerConnectionIceEvent emitted on that listener will contain candidate property that represent a new candidate that should be sent to the remote peer using the Signalling mechanism as mentioned above.

// Listen for local ICE candidates on the local RTCPeerConnection
peerConnection.addEventListener('icecandidate', event => {
    if (event.candidate) {
        signalingChannel.send({'new-ice-candidate': event.candidate});
    }
});

// Listen for remote ICE candidates and add them to the local RTCPeerConnection
signalingChannel.addEventListener('message', async message => {
    if (message.iceCandidate) {
        try {
            await peerConnection.addIceCandidate(message.iceCandidate);
        } catch (e) {
            console.error('Error adding received ice candidate', e);
        }
    }
});

Once ICE candidates are being received, we should expect the state for our peer connection will eventually change to a connected state. To detect this, we add a listener to our RTCPeerConnection where we listen for connectionstatechange events.

// Listen for connectionstatechange on the local RTCPeerConnection
peerConnection.addEventListener('connectionstatechange', event => {
    if (peerConnection.connectionState === 'connected') {
        // Peers connected!
    }
});

Be careful! Danger ahead while selecting the video infrastructure for your next use case!

by admin | Nov 7, 2025 | WebRTC

Are you a planning to start working on building a WebRTC based audio-video calling infrastructure for your upcoming use case? If yes, then be careful, there can be danger ahead if you don’t select the correct media server for your use case today. We are here to help you selecting the correct media server or infrastructure based on your use case need as well as the phase of the execution of the project you currently are in.

The possible alternatives for your use case are agora, tokbox, twillio, jitsi, janus, mediasoup, kurento, openvidu etc. This list has many other names in it as well. Her we have considered the most popular options out there to help you build your video infrastructure.

We primarily divide these above mentioned options into 2 categories. First one are the vendors who provide video apis so that other applications integrate them with less to very less effort. Second one are the open source media servers which one can use to build video applications on top of it as well but in this case the implementer i.e. you need to take care of the implementation part of it as well as the infrastructure part of it. Let’s analyse both of these options more elaborately and provide specific suggestions for each of these options.

agora / twillio / tokbox :- These are good, dependable and scalable api services. Agora is my top pick among them. Use these if you are building a MVP for a demo or if your video calling solution is not core to your business and your product’s customer satisfaction index is not directly dependent on this solution which you have build on top of them. The reason is that if they change their apis overnight, it shouldn’t break the core of the product. It is better to have less dependencies for the core of your product.

Jitsi:- Jitsi is a really nice end to end solution for in house video conferencing needs when you don’t want to depend on zoom or MS teams for your team video conferencing needs. You can deploy the whole solution quickly and run it on premise cloud or managed cloud and use it without any issue. It is a good open source zoom alternative, may be the best one but it is not very good if you want to integrate it with your product to build your solution on top of it as it is less agile, modular and customisation unfriendly. It’s architecture and it’s approach of getting things done in a certain way make’s it the best open source zoom alternative but not a great media server for your next webrtc project.

Janus/ mediasoup:- The 2 best open source media servers available today ready for integration in any use case. These are stable enough and scale nicely without much issues. Some amount of tweaking will be needed to align it to your use case. The only point to mention about mediasoup is that it is currently available only as a npm module which means it can only be integrated with a nodejs backend server. Nodejs is a great backend server according to me and I use it for may use cases. I can assure you that Nodejs+mediasoup work very well with each other and is a wining combination. These both can handle 100+ participants in one conference call if implemented properly on a scalable infrastructure.Use these if your use case is more focussed on large group calls.

Kurento/openvidu :- Kurento is one of the most versatile open source media servers out there with capabilities of MCU and SFU based upon configuration. It also has options to integrate different type of filters like face detection filters, chroma filters etc. on the live media stream. It also provides capabilities to record the streams to a file or a http end point. Use this if you have unique use case where you don’t need a large group call and if you have in house kurento expertise. Openvidu is a signalling and application server combination built on top of kurento which can be used as an Jitsi alternative.

BigBlueButton:- BBB(BigBlueButton) is a good option for online classroom/ learning use cases. If you are a school or college OR a business catering to the needs of schools and colleges, looking for an online teaching learning solution on self hosted/ cloud based online classroom solution on your own domain name with full control, then this is a good solution for you. It has all the option needed for the teacher to control the students including muting all students in the class.

Finally there can be a question in your mind that should one own the video infrastructure or use it as a service? To answer this question of owning the WebRTC infrastructure, our take is this. If the WebRTC infra is core to your business and your monetisation strategy is directly dependent on it’s performance, it is advisable to own the whole infrastructure else go for a managed service or even outsource it.

In case you are looking for developing your next video project or adopt an open source one for your own use case, we will be delighted to provide our support in helping you achieve a great result for your use case. Don’t hesitate to drop us a mail at hello@centedge.io for a free 1st round consultation. #Happytohelp.

« Older Entries

WebRTC getUserMedia: The secret for using media devices in the browser

Media Constraints

Playing the video locally

WebRTC for Absolute beginners, an overview

Share the camera or share the screen, do it all with these browser APIs

The open secret

Media constraints

Capture camera using getUserMedia

Capture screen using getDisplayMedia

Tips and tricks

WebRTC RTCPeerConnection: The secret behind connecting peers in the new video call app

Peer connection initiation

Simple signalling server

Initiating the call from browser A

ICE Candidates

Trickle ICE

Be careful! Danger ahead while selecting the video infrastructure for your next use case!

Recent Posts

Recent Comments

Industries

Solutions

Reach Us

Follow us on: