Content
Welcome to the fascinating universe of spatial audio and the world of 3D sound! These are terms you’ve likely heard before. While most sound experts often talk only about Dolby Atmos, for me, that’s just a small piece of the possibilities. So, let me give you a brief but comprehensive overview of what’s yet to be sonically explored out there. I’ll not only hype everything up but also critically examine the downsides.
For over half a decade, I’ve been exclusively working in the realm of immersive media applications. In the German-speaking world, this has arguably made me one of the most enthusiastic individuals on this topic. Also, one of the greatest critics of this 3D sound.
At its core, 3D audio describes an audio technology that precisely locates and reproduces sound in a three-dimensional space. This concept is closely related to the term spatial audio, which encompasses various formats like stereo audio, surround sound, binaural audio, and ambisonics. Unlike traditional stereo or surround sound formats, 3D audio allows sounds to be perceived not only in front, behind, left, and right but also above and below the listener.
Why could this be a significant advancement? Well, human auditory perception is naturally three-dimensional. We recognize and interpret sounds in our everyday lives based on their spatial positioning. If someone stands behind us and speaks, we can identify it due to the sound source without needing to see the person.
Technologically, it has become possible to artificially recreate this natural spatial sound impression—a crucial step in audio reproduction that enables a more immersive and realistic acoustic experience.
Immersive audio, often referred to as spatial audio, is a cutting-edge technology that creates a three-dimensional sound experience, enveloping the listener in a “sound sphere.” Unlike traditional stereo or even surround sound, which primarily operates on a horizontal plane, immersive audio adds a vertical dimension, allowing sounds to come from above and below as well as from all around.
This creates a more realistic and engaging auditory experience, making listeners feel as if they are truly inside the sound environment.
The key to immersive audio lies in its use of audio objects. Instead of being confined to specific audio channels, sounds are treated as individual objects that can be placed anywhere in a 3D space. This allows for precise localization and movement of sounds, enhancing the sense of immersion.
Technologies like Dolby Atmos and Sony 360 Reality Audio are prime examples of immersive audio formats that utilize this object-based approach.
Immersive audio is not just about adding more speakers or channels; it’s about creating a more natural and lifelike listening experience. Whether it’s the subtle rustling of leaves overhead or the dramatic roar of a jet passing by, immersive audio brings a new level of depth and realism to sound design.
This makes it a powerful tool for audio engineers, offering endless creative possibilities in fields ranging from music production to virtual reality.
The journey of surround sound and immersive audio is a fascinating tale of technological innovation and creative exploration. It all began in the 1940s with the introduction of multi-channel audio in cinemas, aiming to create a more engaging auditory experience.
However, it wasn’t until the 1970s that surround sound started to gain traction in home entertainment systems, with the advent of quadraphonic sound, which used four channels to create a more enveloping sound field.
The real breakthrough came in the 1990s with the introduction of Dolby Digital 5.1, which became the standard for home theater systems. This format used six audio channels: five full-range channels (front left, front right, center, rear left, and rear right) and one low-frequency effects channel (subwoofer). This setup provided a more immersive sound experience, making viewers feel as if they were part of the action.
The evolution continued with the development of 7.1 surround sound, which added two additional rear channels for even greater spatial accuracy. However, the most significant leap came with the introduction of Dolby Atmos in 2012. Dolby Atmos revolutionized the concept of surround sound by introducing audio objects and height channels, allowing sounds to move freely in a three-dimensional space.
This marked the beginning of the immersive audio era, where the focus shifted from channel-based audio to object-based audio.
Today, immersive audio technologies like Dolby Atmos, DTS:X, and Sony 360 Reality Audio are pushing the boundaries of what’s possible in sound design. These technologies are not only enhancing the audio quality of movies and music but are also transforming the way we experience virtual reality, gaming, and live events.
The journey from simple surround sound to fully immersive audio has been a remarkable one, and it continues to evolve, offering new and exciting possibilities for the future.
One reason why 3D audio technologies are slowly reaching mainstream adoption is that creating an authentic 3D auditory experience can be challenging. There are two common ways to experience it: through speakers and headphones.
Let’s start with speaker configurations. Although 5.1 surround sound is considered a common system, it hasn’t entirely penetrated home environments. This surround sound setup requires five speakers positioned front left, front right, center, rear left, and rear right, complemented by a sixth speaker in the form of a subwoofer for low frequencies like explosions or deep bass, creating an immersive audio experience.
However, 3D audio requires even more speakers, such as the emerging standard format 7.1.4 in sound studios. This means studios must install at least twelve speakers on walls or ceilings to enable spatial sound reproduction.
To offer a more comfortable option for consumers, soundbars exist. These elongated speakers are often placed under or in front of the TV and use various speakers to project sound in different directions – frontward, sideways, or even upward. Soundbars utilize room reflection by bouncing sound off walls or ceilings to create an immersive sound experience. However, reproducing sounds from behind remains challenging.
As an alternative to speaker systems, headphones provide a more personal solution for 3D audio. The widespread use of headphones allows for binaural playback, and special headphones are not necessarily required, although manufacturers would love to sell you a new pair.
The crucial requirement for playing 3D audio content is that it should either be binaural or the playback device should convert it into two channels in real-time, as happens with Dolby Atmos, for example. However, not all content marketed as “3D audio” genuinely offers an authentic 3D sound experience. Some formats are better suited for specific genres or applications, but more on that later.
For the optimal listening experience, I recommend over-ear headphones that produce sound further from the eardrum. These headphones with a linear frequency response distort the audio content the least. Headphones with head-tracking functionality, present in some models, adjust the sound field in real-time to the listener’s head movements. This solves the problem of distinguishing binaural sounds from the front and back.
On platforms like Apple Music, for example, you can already listen to Dolby Atmos mixes with dynamic head tracking. Of course, you’ll need the matching AirPods for this. But the issue here is: None of the mixes were mixed for head-tracking, they were mixed for speakers. So, I came up with the idea of creating an overview of how one can structure the various applications, formats, and playback media:
Admittedly, there are already several articles for beginners claiming to cover “everything one needs to know.” But these drive me crazy for one reason: they nicely tell the story with Mono, Stereo, Surround, and the new thing: Dolby Atmos. In my world of immersive audio, however, we are just getting started with audio productions that cannot even be depicted with AC-4.
My 3D audio matrix offers a more comprehensive approach, considering not only where I can place audio objects but also whether I should only look forward (0DoF), rotate my head (3DoF), or even move in space (6DoF). DoF stands for “degrees of freedom.”
Let’s summarize where I can place my sound:
In all these formats, one typically stares straight ahead (as in movies). There’s no movement involved. With music, it’s not precisely defined where we should look, but in essence, no interaction is intended. They are linear media where you press play and, at some point, they come to an end.
However, there’s 3D audio, where this movement is desired. I can reveal that this is where the greatest value of 3D audio lies, as one can quickly find themselves in 3D worlds. So, when people talk about 3D audio, most are usually referring to 0DoF, but one should distinguish what complexity one actually has.
I must admit, I’m a fan of headphones. Everyone has them at home and theoretically can experience 3D audio with them. Whereas with speakers, one needs many or a specific soundbar. Perhaps a few thoughts on this differentiation because most tools, like the Dolby Atmos Production Suite, always focus first on speakers and then on headphones.
However, in reality, it’s not as simple as pressing a switch labeled “Binaural” and automatically experiencing a great cinema-like encounter through headphones. In implementation, there are numerous parameters and challenges in converting the spatial impression from many speakers to two-channel stereo sound.
However, this is actually the principle of object-based audio, the technology behind Dolby Atmos (AC-4), Fraunhofer’s MPEG-H (or Sony 360 Reality Audio). Ideally, one creates a multichannel mix that sounds good on every device, whether it’s a smartphone, TV, soundbar, or headphones. During playback, the production is optimized in real-time for the respective medium.
This is one of the reasons why the adoption of Atmos and similar technologies hasn’t been as convincing. I believe that currently, it’s worthwhile to specifically mix for headphones rather than hope that everything will sound okay somehow. Headphones have an advantage that speakers don’t possess—a so-called in-head localization. Yet, there are attempts to eliminate it, unjustly, in my opinion.
When listening via headphones, mono and stereo are mostly perceived inside the head. While stereo allows movement of sound between left and right, it lacks the ability to differentiate the sound spatially—front, back, top, and bottom. Binaural stereo solves this problem, providing a feeling that the sound happens around you, not just in or on the ears/head.
This seems to bring us closer to our natural auditory experience. Because we can never perceive speakers as in-head localization since they aren’t precisely placed on our ears. Nevertheless, this psychoacoustic phenomenon offers us decisive advantages, especially as we navigate the world of 3D audio in more complex ways.
Still, sounds a bit abstract? Don’t worry, let’s move into practice now.
0 Degrees of Freedom (0DoF) in the world of 3D audio is actually quite normal for us, as it typically implies a predetermined direction for the listener. In other words, this means that concerning the audio content, the listener has no additional freedom of movement and remains in a fixed position.
But what I want to emphasize is this: it’s not only that people can move with the content—they actually shouldn’t. Here, the keyword is “immersion,” as it’s more about creating an immersive soundstage that truly captivates the listeners, quite literally keeping them on the edge of their seats.
Applications of 0DoF 3D Audio can be found in various fields, including movies, music, podcasts, and even live events. In movies, this type of 3D audio enables a linear audio experience. The sound can be spatial, but the intention is for people not to be distracted by noises behind them. Whether a helicopter flies overhead or not doesn’t change the story.
The content would actually work fine in stereo or even mono. This is where the multichannel sound of Dolby originates, and despite my reservations, it’s somewhat fun to feel “immersed in the movie.”
Someone thought of utilizing this audio technology for music production. It’s marketed for live sound, immersion, and being closer to the musicians. However, most mixed tracks acquire a spatiality that doesn’t suit every genre. Furthermore, it often loses the impact found in radio-friendly stereo sound.
The only advantage is that during the mix, fewer compromises need to be made. Instead of just left and right, one can now move objects along the Y or Z-axis. This allows our brains to perceive them more distinctly and potentially reduces the need for depth layering and EQ work.
Now, using this for podcasts is a dreadful idea, even to me. Typical conversational podcasts work perfectly fine in stereo. Often, even mono is sufficient. You can’t just artificially add spatiality to this audio format and expect it to become immersive audio.
3D Audio for live events is currently in its exploration phase. Several providers are presenting their own solutions. But why should one hear sounds from behind when the stage is in front? Simply put, you shouldn’t! Initially, the goal is more about solving challenges with unstable phantom sound sources using multiple line arrays. In the second phase, one can go wild and indeed integrate it into live operations.
For me, there isn’t that much-added value here. The strength of object-based audio, for me, lies primarily in its personalization. How cool would it be to watch football and mute the commentator while being enveloped in 3D stadium atmosphere? I see absolute potential in this technology and hope that MPEG-H will prevail here because the Dolby Atmos Renderer is severely limited in functionality.
When implementing 0 Degrees of Freedom (0DoF) 3D Audio, differences in required investments become evident depending on the playback method. For entry through headphones, the barrier is comparatively low. Only headphones are necessary, and the higher the quality, the better. These headphones don’t necessarily have to be specifically designed for 3D audio.
There are also enough free plugins like DearVR Micro for purely binaural work. For those interested in multi-channel audio, my recommendations are the IEM Plugin Suite with the DAW Reaper.
However, even Dolby Atmos doesn’t demand tens of thousands of euros in software and hardware anymore. The Dolby Atmos Production Suite is available for a few hundred euros and is now standard in Logic. Sony 360 Reality Audio costs a similar amount, while the MPEG-H Authoring Suite delivers more functionality and is even free.
In contrast, implementing 0DoF 3D Audio through speakers requires more hardware and a more careful configuration. For precise reproduction, setting up an acoustically optimized room with multiple speakers, subwoofers, bass traps, and acoustic treatments is necessary. With interfaces, cables, etc., costs can quickly reach a five-figure sum.
Most audio engineers hope to make this investment to then demand higher prices for their services and studio. Unfortunately, the reality is: that you can’t just mount speakers on the ceiling and expect customers to come flocking with money in hand.
In 3 Degrees of Freedom (3DoF), the listener can perceive not only the direction of sound but also experience a certain degree of freedom in movement. This technology allows the listener to change their perspective or position to influence the auditory experience. However, applications are generally linear, maintaining fixed starting and ending points.
3DoF 3D Audio is frequently used in 360-degree video content where looking around and listening to the environment are explicitly desired. Often, this is also discussed in the context of Virtual Reality, although that arrives with 6 degrees of freedom. This means that the perception of sound synchronizes with the listener’s movement in real-time, creating a certain level of immersion and interactivity.
This resolves one of the issues we encounter in binaural audio: differentiating between sounds in front and behind. For instance, if an object is positioned to the left in front of us, it will have the same volume and delay as if it were positioned to the left behind us.
Only our ear shapes can aid in this determination. However, a slight head movement is much more effective, and our brain immediately identifies the location of each sound in the scenario.
Can this be done already with Apple products? Yes, for instance, by accessing Apple Music and enabling 3D sound on AirPods. The significant problem: The content wasn’t mixed with the intention for listeners to turn around at all. Everything is meant to be 3D, just like with speakers. However, it would make total sense to have the option to make objects head-locked.
This mandatory stereo track has been standard in the 360VR domain for years, but nobody talks about it in the realm of “immersive music production.”
According to Dolby and others, one should obediently look “forward.” Hence, I make the distinction between 0DoF and 3DoF because theoretically, you can also listen to 5.1 surround sound with head tracking. The issue, however, lies in the fact that traditionally, a significant amount is mixed in the center and front, while very little happens at the back. Remember: people shouldn’t turn around.
The speaker-side counterpart to this is dome cinemas. Usually, they don’t fill out a sphere but only a half-sphere on the vertical plane. Additionally, on the horizontal plane, they usually cover “only” 270° instead of 360°. Moreover, due to the seating arrangement, one cannot turn backward. However, particularly since the introduction of the Sphere in Las Vegas, despite immense technical demands, this topic has gained more momentum.
Those seeking to listen to 360 videos with 3D Audio can find them for free on platforms like YouTube and Facebook. The associated plugins, like the FB360 Workstation, were once available for free but are currently not supported. Granted, the hype around spherical videos has diminished, but for me, it still absolutely has its validity.
Too rarely has potentially been utilized correctly, and things have been done that would have worked as regular 16:9 content. Besides, experiencing this on a Head-Mounted Display is much more enjoyable. There was the Oculus Go, a very affordable device for €200, and now there’s competition from China with the Pico G3.
Such devices are referred to as having 3DoF because while you can move your head in VR, you can’t move your body.
Dynamic head tracking also exists without video content. However, this typically requires a specific pair of headphones or hardware to track acceleration. Apple has implemented this admirably. You immediately understand whether you’re listening to multi-channel content or just stereo mixes with added spatiality.
Head tracking is cool, but unfortunately, for the mentioned reasons, it doesn’t answer the question: Is the content suitable for it?
6 Degrees of Freedom (6DoF) 3D Audio offers the highest spatial freedom and interactivity within the 3D audio spectrum. This level allows the listener to move freely within the three-dimensional sound space and experience various auditory perspectives – moving away from the sweet spot.
Within the 6DoF concept, it’s no longer a linear file played back. Instead, multiple individual sounds are delivered, either looped or triggered through interaction. This often leads to encounters with game engines like Unity or Unreal, albeit being quite rudimentary on the audio side. Hence, such projects often incorporate middleware like Wwise or FMOD.
This degree of freedom opens new dimensions for immersive audio, particularly in virtual reality applications and interactive audiovisual environments – a concept familiar to most from gaming.
However, the content here is crucial. A two-dimensional strategy game cannot benefit as much from immersive audio as an FPS game can. Gamers know the importance of not only seeing opponents in time but also hearing them beforehand.
Similar considerations apply to immersive media like VR, AR, or what’s newly termed as the Metaverse or Spatial Computing. Let’s not fixate on terminologies but rather on the fact that audio production here is significantly more complex. With numerous parameters to adjust sound, it becomes crucial.
For instance, one must configure how quickly an audio object fades when one moves away from it. At what angle does it emit sound when turning away? Are there occlusion effects if a wall separates us? Etc.
Venturing into the realm of immersive and interactive audio requires a significant investment, both financially and technically. The first step is acquiring the right hardware. For immersive audio, high-quality headphones are essential, especially those with head-tracking capabilities to fully experience spatial audio.
Over-ear headphones with a linear frequency response are recommended for the best audio quality. For those looking to set up a surround sound system, multiple speakers, subwoofers, and an acoustically optimized room are necessary. This setup can quickly escalate in cost, reaching into the five-figure range.
On the software side, tools like the Dolby Atmos Production Suite, Sony 360 Reality Audio, and the MPEG-H Authoring Suite are indispensable. While some of these tools are available for a few hundred euros, others might require more substantial investment.
Additionally, game engines like Unity or Unreal, along with middleware such as Wwise or FMOD, are crucial for creating interactive audio experiences. These tools allow for precise control over audio objects and their behavior in a 3D space.
Beyond hardware and software, investing in education and training is vital. Understanding the complexities of immersive audio production requires a solid foundation in sound design principles and familiarity with the latest technologies. Online courses, workshops, and tutorials can provide valuable insights and hands-on experience.
For those serious about mastering this field, continuous learning and staying updated with industry trends are essential.
In summary, while the initial investment for immersive and interactive audio can be substantial, the potential rewards in terms of creative possibilities and enhanced audio quality are immense. Whether you’re an aspiring audio engineer or a seasoned professional, embracing this technology can open up new avenues for innovation and artistic expression.
Even though this field primarily involves headphones, it still carries specific costs and investments. Hardware is crucial: powerful computers and suitable VR headsets are indispensable. As a Mac user, certain compatibility issues arose for me, necessitating the use of a Windows-based system as well.
While the costs of VR headsets have decreased significantly, they also age rapidly. Constant tool and software updates are also a rather bothersome cost factor.
Moreover, implementing 6DoF 3D Audio typically requires a certain understanding of programming and development work. Investments in further education and training might be necessary to acquire the necessary expertise for sound engineers.
Overall, 3D Audio presents a fascinating realm of acoustic experiences that surpass traditional sound formats. From mono to Dolby Atmos, this technology unlocks possibilities that can revolutionize our auditory experiences in music, movies, and gaming.
For beginners, entry via headphones is simpler and more cost-effective compared to speaker systems. Although initial investments vary, education in this field is essential to grasp the full scope of 3D audio production.
My video course provides an optimal platform to gain in-depth knowledge and practical skills. On my blog, you’ll find a lot of content, and I regularly update the community.
The continuous evolution and integration of 3D Audio into various domains promise an exciting future that has only just begun. These were the basics, so feel free to let me know what else you’d like to learn about!
Back to the blog