This is the first article into my research blog. Here I will post (hopefully) in a regular basis with the outcomes of my Master Thesis, which has the provisional title Real-Time Tools for 3D Audio Spatialization.
For the thesis, I will be using the spatial audio system from Barcelona Media, which have up to 24 speakers in a non-regular setup. They have been working on Ambisonics, and I will use mainly this spatialization technic.
Ambisonics (http://en.wikipedia.org/wiki/Ambisonics) is an audio recording and reproduction system developed by M. Gerzon and P. Fellgett in the 1970’s. It is based on the soundwave decomposition in spherical harmonics. Due to several reasons (economical, audio standards, computational power..) it has been never widely known neither used, but its mathematical basis is very solid, and nowadays it is possible to use it successfully as the spatialization method. The most complete source of information about Ambisonics is the PhD Thesis from J. Daniel, Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia (pitifully only in french). It can be found here: http://pcfarina.eng.unipr.it/Public/phd-thesis/jd-these-original-version.pdf
Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia
One of its main features is that the channel assignment is not depending on the speakers. Traditionally, one would have one channel for speaker; the problem with this paradigm is that audio encoded for one speaker layout can be only perfectly reconstructed by using exactly the same speaker layour. This compatibility problem is widely known, and has bring a number of audio standards (5.1, 7.1, 10.2, etc), which anyway could not compite with the de facto standard (stereo).
Ambisonics, conversely, work in a spatial channel encoding paradigm. In other words, each channel streams information about one spatial direction. Before the speakers, there should be a decoding stage, which would split the audio according to the current speaker layout. Working on this way, we can reproduce spatial audio in any system, independently of its layout.
There are infinite Ambisonics orders, which corresponds to the number of terms used in the spherical harmonic series expansion. Traditionally, the most used is First Order Ambisonics (FOA), which has a direct physical interpretation. In FOA, we encode the audio using 4 channels: one for each spatial axis (X,Y,Z) and one for the pressure (W). This encoding is equivalent to use three figure-of-eight microphones (one for each direction) and a omnidirectional one for W. However, FOA has not a good spatial resolution, an therefore other techniques (such as VBAP) can be used with more accuracy.
However, since computational power allows us, we can extend arbitrarily the order (we will be limited for the number of speakers), increasing the spatial resolution and the overall sensation. For a regular speaker layout, we need at least one speaker per channel for a good soundfield reconstruction.
For a given order n, we will have 2n + 1 different channels (above order 1 we lose the physical relationship with our 3D space), corresponding to the 2n + 1 spherical harmonics with this order. Since they are accumulative, the total amount of harmonics (and channels) for a given order n will be (n+1)**2.
Here we can observe a nice representation of the spherical harmonics
Since we have 24 speakers in the studio, and they are not regularly distributed, we will be able to work up to n=3 (third order ambisonics).
The Ambisonics spatial channel encoding is calculated through the formula found in Gerzon, chapter 3.1.2, pp 150. I coded some python functions which plots the spherical harmonics and the encoding of a point source. They can be found in my thesis git repository https://github.com/andresperezlopez/rt-spatialization
On next days I will extract some animations from these functions and share them here.