Spatial audio refers to the creative realm of sound recording, processing, and design in the context of three-dimensional acoustic perception. There are many synonymous terms - immersive audio, surround sound, Binaural, Ambisonic, 3D sound, Dolby Atmos, 360º sound, and more. Whilst there are technical variations between them, they all have the same goal - to create audio with a deeper sense of space than standard 2-channel stereo.
You may be wondering whether creating spatial audio would be a useful skill to learn, for your music or audio production. In short, it is a powerful tool that expands your creative potential, but it also opens many other doors for everybody working with sound, from musicians, to sound designers and more.
In this article, we explore the tools, techniques, and science of spatial audio, and provide you with the resources to successfully implement it in your own audio productions.
The field of spatial audio is rapidly becoming more relevant, as platforms and popular media formats are increasingly relying on immersive audio to augment their experiences. Most recently, it even has found its way into mainstream productions, for example, of Netflix* in the form of AMBEO 2-Channel Spatial Audio as a standard audio configuration.
Also in connection with Apple* devices, you hear the term spatial audio or 3D audio more and more often. At this point it is important to mention, that spatial audio is not an invention of Apple, but the company has recognized the importance of spatial audio and supports the use of 3D audio with their own Apple spatial audio format.
Last but not least, with Dolby Atmos, Dolby* has also developed a spatial audio standard format that has still continued its conquest out of movie theaters and into the living rooms of the end users.
In most situations, audio is presented in a stereo format, meaning the sound only originates from a single linear dimension (left-right). Whilst this suffices for basic listening, it is severely limiting in terms of 1:1 realism, and it massively under-utilizes the human hearing system’s potential.
Surround sound advances on the stereo format, bringing a second dimension (front-back) to the listener. Whilst this does expand the single dimension of stereo, it still does not offer complete immersion.
This is where spatial audio (aka 3D audio, 360° audio, …) comes into play, as it utilizes a third dimension (above-below) to create a fully immersive, all-encompassing soundstage. In this way, spatial audio creates sonic projections that are essentially directly comparable to real-world audio. This form of audio has many uses, particularly in movie theatres, video games, and VR, although it can be used for all kinds of sound, like music, podcasts, and more.
As you learned above, the term "spatial audio" refers to any audio format that allows you to hear sounds in three dimensions rather than just two, such as surround sound. When listening to stereo sound, you can pick up noises from the left and right sides of the room, but you don't receive a sensation of depth or sounds coming from below you. With spatial audio, it is now possible to pinpoint the precise position of sound sources around you thanks to its third dimension.
The key relevance of binaural audio
Binaural audio is another core format in the spatial field. It refers to a certain type of processing that mimics the real-life perception of audio in space via the human hearing system.
The psychoacoustics of the human spatial audio perception rely on several phenomena for interpreting its cues in nature. Interaural Time Difference (ITD) is a core component of this. When a sound comes from anywhere other than the direct center, there is a time difference between when the left and right ears receive the acoustic waves. This is used by the brain to interpret direction. Moreover, the head’s physical mass also absorbs sound energy - affecting the differences in levels between left and right - known as Interaural Level Difference (ILD).
ITD is more effective for lower frequencies, and ILD is better for higher frequencies, so the combination of the two is perfect for interpreting the spatialization and direction of sound. By mimicking the effects of ILD and ITD, digital audio can be spatialized with an accurate level of realism.
The binaural format makes a huge difference to the immersion of audio productions and has a far higher level of realism compared to standard stereo formats. Dear Reality's plugins use state-of-the-art binaural processing to create incredibly realistic and transparent (natural) sounding spatial audio. We have researched binaural technology with a great deal of depth to enable our software to fully harness its power.
Head-related transfer functions (HRTFs)
HRTFs are a key component of binaural processing. This type of filtering is applied to binaural sounds to mimic the acoustic effects the human head has on sound waves from the environment as they enter the ears. A large portion of the human psychoacoustic listening process revolves around the interference created by the head. Binaural microphones (or algorithms) recreate this effect to create a more realistic perception of space and direction.
One system for creating immersive audio is based on using multiple speakers. The level of surround complexity is dictated by the number of speakers present in the array. Typical surround sound speaker systems (usually either 5.1, or 7.1) include left and right speakers for the front and back, with an additional middle front speaker which is usually used strictly for dialogue to give more clarity. The extra ‘.1’ speaker is a subwoofer that delivers global bass. This type of surround system enables the placement of sounds anywhere around the listener's head, although only on a single flat plane - without any vertical projection (above/below).
For 360° audio experiences, a more complex multi-channel speaker array is needed to project the height channel. The system needs to convey a sense of audio coming from and above and below. There are a few different ways this can be achieved with speakers. One system simply uses an extra pair (or 4 speakers) mounted above the listener. (For example the Auro 10.1 system). Other formats like Atmos use a large number of speakers, even over 100, although this is obviously more common in commercial movie- rather than home theatres.
The height effect can still be achieved in smaller rooms, even without overhead mounting. Some speaker systems use speakers pointed at the ceiling, which then reflect, creating the impression of sound originating from above the listener.
Using Headphones
Understandably, not everybody has access to a spatial audio mixing room. However, if you are mixing for this format, you need to be able to test the performance of your mix in a similar environment to the destination. Our dearVR MONITOR plugin simulates a variety of rooms and speaker setups, so you can test your mix with precision on headphones, without needing access to these setups!
Even a humble pair of stereo headphones is capable of creating fully immersive spatialized audio. Due to the brain’s psychoacoustic methods of decoding audio spatialization, headphones are highly effective at creating a realistic impression of spatial sound. All it requires is some clever filtering and processing.
Soundbars are an affordable technology that can offer close to 3D audio by using crosstalk cancellation and multiple planes of projection, although they are not as effective as multi-channel speaker arrays or headphones. Crosstalk is the acoustic process of each ear receiving audio from the opposite speaker. So the left ear hears audio spill from the right speaker and vice versa. By canceling out crosstalk using advanced filtering soundbars are capable of producing binaural signals. However, the filter processing needs to be updated constantly to account for movement in head position, so head tracking is required, which isn’t as easy to achieve compared to headphones. Although there is lots of innovation coming through in this space, this process can even be created with stereo speakers.
There are two main methods available for creating spatial and immersive audio: recording and processing. Often, both methods are combined for the best results. A range of recording techniques and technologies exist for capturing spatial audio.
Binaural Head: The Neumann Binaural head (aka Dummy Head) is an industry standard for capturing binaural audio. It uses two omnidirectional mic capsules built into the ears to capture an incredibly realistic, ‘human’ sounding recording. The dummy head gives a much more accurate representation of the original listening environment compared to using a standard stereo microphone setup. Although the listening orientation is fixed to the direction the head is facing, (known as head-locked audio) it doesn’t allow for experiences where the user can change their position or rotation in the listening scene.
Ambisonic Microphones: A unique type of microphone with 4 cardioid capsules aligned in an ambisonic (tetrahedral array) pattern. Using our a software decoder dearVR AMBI MICRO, the recordings are stripped into the individual channels for further processing. These are easy to use although require a specific array of audio interface inputs. Audio is captured in Ambisonics A-format, then decoded to B-format. The resulting audio is then delivered in four directional channels.
X = Depth axis (Front-Back)
Y = Width axis (Left-Right
Z = Height axis (Up-Down)
W = Pressure/Omni channel (all directions with equal gain and phase)
In-Ear microphones: These are an effective way of capturing audio that has a realistic head feeling, and are particularly useful when used with head-mounted cameras (or a go-pro). They are a more affordable system for capturing spatial audio than a binaural head, although they do come with the disadvantage of capturing some of the noise created by the wearer.
Multi-Microphone Array: If you do not have access to specific spatial mics, you can create an effective multi-microphone array to capture sound from multiple directions. This technique uses a selection of standard mono microphones (or custom spatial mics) arranged in different patterns to capture space as intended. This method provides more flexibility, although requires some practice, and additional post-processing for the best results.
Point and Pan: (aka Point-source recording). Even using a single mono microphone can yield spatial results with the right processing. This technique involves capturing sounds up close with a mono mic, then using spatial processing software (such as dearVR MUSIC) to digitally spatialize the audio. This works well for channel or object-based audio and can be translated for broadcast across any kind of spatial, surround, or 3D speaker setup. This method is the best for beginners or those with a limited budget who want to make immersive audio with a DIY approach. That being said, it is still used in professional settings and can produce highly realistic results.
Even mono and stereo recordings can be processed with spatial audio technology to create a more immersive sense of space and directionality.
At Dear Reality, we develop professional tools for spatial audio creators. We have built a range of products that give engineers more creativity, streamline spatial workflows and overcome common issues.
dearVR PRO 2
One of the best tools for creating spatial audio is dearVR PRO 2.
This plugin is a spatial audio powerhouse, giving you next-level control over your spatial mixes, whether you’re working in binaural, Ambisonic, or multi-channel speaker formats. dearVR PRO 2 gives creators the ability to direct audio in a spatial environment with unprecedented detail, accuracy, and flexibility.
dearVR MONITOR
Not everybody has access to accurate mixing environments, let alone with multi-channel speaker arrays.
This is where a virtual mix room like dearVR MONITOR can be a life saver. This makes the spatial mixing workflow easier, quicker, and more effective. Now you can test your mixes in a range of acoustic environments and speaker formats without having to leave your chair!
-
Get it right at the source.
Spatial recordings are even harder to fix in post-production than standard formats. Always make sure your recordings are right before packing up a session.
-
Be distance conscious.
Make sure you get a good, realistic distance, particularly if matching the audio with visuals. Try to find the right balance of directionality and ambiance as needed.
-
Practice plenty.
Get the technique right before recording, it's better to get it right at the source rather than fix it in post. These systems can take a while to work out.
-
Consistency is key.
You should keep the same spatial recording and processing throughout. Switching between types can be distracting and break immersion. Make sure you always use the same microphones and processing to prevent any inconsistencies.
-
Do not combine microphones.
Unless done for creative effect, you should make sure you use the same recording device for all audio in a project. Using different microphones give sounds a detached, un-cohesive feeling, often resulting in a jarring and disturbed listening experience.
This article has helped you to understand the possibilities and practices of spatial audio. If you are new to the scene, why not get in and try it out yourself. Integrating spatial audio into your creative process can be a massive advantage, as it provides whole new dimensions of expression and potential.
Not only can mastering these techniques open doors for your artwork, it also gives you a competitive edge in the industry in terms of careers and commissions. Spatial audio is becoming a huge part of many media formats and industries, so being able to use this technology puts you at a distinct advantage.