How to mix the head-locked stereo track with Ambisonic

Content

State of play

One of the problems that is probably most frequently asked about in the Facebook Spatial Audio group is the following: Mixes created with the FB360 workstation have an obligatory 8-channel Ambisonic wav file in TBE format, as well as an optional stereo track for music, i.e. so-called head-locked audio, which, unlike the spatial audio file, does not rotate when the head moves. Head-tracking therefore means that the sound changes with the direction of gaze, while head-locked remains independent of the direction of gaze during playback. If you are new to Ambisonics, I recommend that you first read my short article as an introduction: Ambisonics for Virtual Reality Video.

So far so good, working with your own TBE format on Facebook is no problem. In addition to the 360° video, its own Audio 360 encoder can select the multi-channel spatial sound file as well as the static head-locked as input and it outputs a final video file with the 8 + 2 audio channels.

However, YouTube has supported 360° sound for some time, but as 4-channel audio, Ambisonics first order in ambiX format. If you now try to select “YouTube Video (with 1st order ambiX)” instead of “Facebook 360 Video” as the format in the Facebook encoder, the “Head-Locked Stereo” option is greyed out.

Because this fact is very unsatisfactory and is causing a lot of forure on said social media channels, I wanted to remedy the situation with this article.

Back to the topic, the fact that the encoder refuses to work here is not even its fault: YouTube simply does not support Facebook’s hybrid format. You can now only have your 8-channel tbe file automatically transcoded into 4-channel ambiX, the static stereo is left out.

But what do you do with music? Before I start a fundamental discussion about whether music should be spatial or static, I have already discussed this a little here.
But now let’s get down to the solutions:

Workarounds for head-locked stereo

Make it mono

The simplest solution is to convert your head-locked stereo file into a summed mono file. This is then routed to the first channel of the Ambisonics file. This represents a mono-compatible, omnidirectional stream, regardless of whether TBE or ambiX is used.

This solution is therefore best suited for voice-overs, which are usually mono and desired as in-head localisation.

Stereo music, however, will sound significantly worse because all side components are lost. In general, you should make sure that the first channel is not too loud, as even if the level is not 0dBFS, this can lead to distortion when decoding the binaural playback later.

MS – Mid/Side Stereo

A similar approach that can retain the side components is as follows. The head-locked stereo track, which normally has two tracks, i.e. left and right channel, is now MS-encoded. Here, the first channel selects all mono components (left+right)x0.5 and the second channel selects all side components (left-right)x0.5. Now the first channel is routed to the first channel of the spatial audio source as described in the mono workaround.

The second channel can now be routed to any of the other axes. The height axis is recommended, as this is the axis around which most of the rotation will occur when viewed later on the VR glasses. This means that the side signal would only change if the viewer moves their head up or down, or rolls their head to the left or right, for whatever reason.

Quad layout

Another approach is as follows. Here, the two channels of the head-locked stereo are actually placed in the room, e.g. at +45 and -45° azimuth. Now you would have achieved exactly what you don’t actually want – the music turning noticeably – but we’re not finished yet. That’s why you simply duplicate the music and place channels 3 and 4 at -135° and +135° azimuth.

Now you have virtually created a quadrophony, comparable to a 5.1 set-up without centre + LFE and symmetrical angle distribution of 90°. It is advisable to tinker with the “Spread” parameters in order to avoid the 4 objects in the room becoming really localisable during head-tracking, i.e. becoming more diffuse and merging somewhat with a higher spread.

You can fine-tune even more and take a total of 6 objects instead of 4 and select angles at a distance of 60°. This makes the head-tracking, i.e. the movement of the music, less obvious, but brings it closer to the sound of the mono workaround. It is similar with the approach of not routing the 4 objects 100% left, right, left, right, but rather mixing in, for example, 50% of the other channel in each case.

Conclusion

There are many possibilities, but none of them is perfect. Nevertheless, with a little experience it is possible to bend the whole thing so that the layman does not notice the difference. The content on the head-locked track is highly dependent, e.g. classical music differs so much from electronic music that a different workaround makes sense in each case.

In combination with the 3D audio stream, it may be that the difference is barely audible, for example when it comes to typical film sound. With sensitive music applications, however, it can quickly become a problem.

It would be desirable for the Audio 360 encoder of the FB360 workstation to implement one of the workarounds automatically, because otherwise a new mix has to be created each time with spatial audio containing the music. This can quickly become annoying for mere testing. It would be even better if a head-locked stereo track was supported as standard on all Virtual Reality platforms or by MPEG-H, for example.

I hope I could be helpful with my article. If you have any suggestions, please send me an e-mail.

Get in contact