Why we're excited about ORTC

November 18, 2014Philipp Hancke

Microsoft recently announced they will support Object RTC and now everyone is talking about ORTC and how they will support it.

What is this all about and what is ORTC anyway?

In essence, ORTC is an alternative API for WebRTC. It is object-oriented and protects developers from all that ugly Session Description Protocol (SDP) madness. Some people call it WebRTC 1.1, or maybe WebRTC 2.0.

So... will &yet (and Otalk) (and Talky) support ORTC? Of course!

Mostly because we have a use case where ORTC is better than the proprietary ways of solving certain problems with the WebRTC Peerconnection API. Instead of telling you how much we like ORTC, let me tell you about the problems we've experienced with WebRTC as it stands today.

SDP

ORTC gets rid of the SDP API surface used in the WebRTC PeerConnection API. Being XMPP people, we prefer Jingle over SDP. However, we rarely touch SDP at all, due to the magnificent sdp-jingle-json module that Lance Stout wrote.

This module transforms SDP into a JSON object and back. The object structure makes it somewhat easier to manipulate the description and change things. Still, you need to know about the semantics of the things you are manipulating. Removing SDP is not something we strongly care about. We've hidden it well by burying under layers of abstractions, and we are not using it on the wire.

Capability and Parameter Negotiation

One of the most important aspects I learned about recently is that ORTC distinguishes between capabilities and negotiation through the RTPSenders static getCapabilities method.

Capabilities allow us to query the capabilities of an implementation, e.g. what video codecs it supports, whether it can multiplex everything on a single UDP port pair, etc. Being static it means we can query those capabilities without creating an RTPSender. And we can also figure out if two clients would be compatible with each other beforehand.

Negotiation on the other hand means that two entities (who supposedly have a common capability) decide to use it for a particular session. That is what the Offer-Answer-Model for SDP was all about. Your offer tells me what you support and want to use and I answer with the subset I want to use.

Both capabilities and negotiation are useful and necessary. Capabilities are harder to determine in the PeerConnection API, even though it's not impossible. ORTC just makes the distinction more clear and lets us think about how that distinction influences the protocols we design. However, as we saw with Jingle, cool features like trickle ICE can be backported to SDP semantics.

Talky

We've previously written about the upcoming new version of Talky. It's a multiuser conferencing application that, like Jitsi Meet, uses the Jitsi Videobridge and XMPP. Currently it only works in Chrome (no worries, we're talking with Mozilla).

There are two different problems here:

multiparty conferencing and
upgrading a 1-1 call to a conference

Multiparty Conferencing

In terms of API usage it is more complex than anything out there (until Hangouts came about), doing all kinds of renegotiation, adding and removing remote streams for participants. Chrome enables this through an SDP variant known as Plan B which did not get accepted by the IETF last year. Although that did not Chrome from implementing it.

Basically to add or remove a local audio/video stream you need to do a setRemoteDescription call followed by a setLocalDescription which will trigger an onaddstream and onremovestream callback depending on whether any streams were added or removed. If you want to know all the gory details, please refer to the webrtchacks article I wrote on how Hangouts uses this API. Not very surprising, the features needed were already in Chrome because they were required for Hangouts.

Hangouts also uses some advanced features like simulcast (i.e., sending different resolutions of the same video) which are activated by adding some special lines to the SDP. That's currently completely undocumented and basically black magic. What is also lacking is a way to prioritize streams when several are competing for bandwidth.

Implementing Talky with the current API is possible. However, one should note that "the current API" here includes a number of non-standard proprietary features. And using it felt like jumping through hoops.

How will ORTC change this?

Well, first we don't need to do a setRemoteDescription-setLocalDescription dance. Instead of getting streams (typically consisting of an audio and a video stream) pushed we can use the RTPReceiver API to pull audio or video tracks from the peer connection after setting them up with certain parameters (we want to associate those tracks with the participants in the chatroom). There is also a mode for detecting unhandled RTP streams which potentially allows us to get rid of signaling for individual participants.

The RTPSender objects allow for better control and prioritization of streams. Note that these RTPSender objects are now part of the "1.0" WebRTC API as well and Mozilla has already implemented them. So you can solve that problem there, too.

Upgrading a 1-1 Call to a Multiparty Conference

Quite a sizable portion of the current Talky usage is for 1-1 sessions which are peer-to-peer (with a small percentage being relayed through TURN servers). We do not want to route those sessions via the Jitsi Videobridge for a number of reasons. First, it costs us more money. Second, it decrypts the call and we don't really want to have access to your private conversations. Third, it increases the latency, which affects the quality of the user experience.

So what we actually want to do is to have your 1-1 sessions in peer-to-peer mode and upgrade to a call relayed by the bridge as necessary. In theory, the current PeerConnection API allows this by doing something called an "ICE restart". You open a new media path to the bridge and switch over once it's connected. Turns out that this is currently not implemented by Chrome.

How will ORTC change this?

Well, in ORTC this scenario is easier to describe thanks to the better vocabulary. To do a switch like this, you setup another RTCIceTransport and RTCDtlsTransport object, wait for the connection to become active (by waiting for a RTCDtlsTransportStateChanged event on both sides) and then attach your RTPSender to that new transport.

Just having the right vocabulary to talk about this makes ORTC worthwhile.

Bugs

When you implement applications on top of the PeerConnection API in Chrome or Firefox you will notice some bugs (if not you're probably only doing boring stuff). I ended up with reporting more than 50 Chrome bugs (and a few Firefox ones) in the last two years.

How will ORTC change this?

Well, with the Microsoft announcement I look forward to filing bugs against a third browser. 50% more fun!

When I tried the plugin Microsoft Open Tech released earlier this year it took half an hour to find two bugs.

Once Google adds ORTC as an API surface there will be more bugs there and they will have two API surfaces to support. This is rather going to slow them down. And according to the roadmap we should already have several ORTC elements in the current Chrome 38 which are not even in Canary yet.

TL;DR

ORTC will make some applications easier. Although it's not a magic bullet that will somehow fix all the problems that exist with the PeerConnection API, it looks good on paper and we're excited about playing with running code once it ships in the browsers (especially IE).

Want more cool stuff like this? Then sign up for our newsletter below. It's full of Vitamin I...for interest!

Keeping It Realtime