VIRUS – VLC Iterative Recommender Under SoundCloud

For the Audio and Music Processing Lab, I developed a social recommendation system for music, inspired somehow in the idea of http://en.wikipedia.org/wiki/Six_degrees_of_separation. This is the main idea behind: if somebody likes your music, then it is likely that you will find interesting other music that this person likes.

Copyright Daniel' - CC BY SA 3.0. Image from http://en.wikipedia.org/wiki/File:Six_degrees_of_separation.svg

Copyright Daniel’ – CC BY SA 3.0. Image from wikipedia.
http://en.wikipedia.org/wiki/File:Six_degrees_of_separation.svg

For the system implementation, I choose SoundCloud, which hosts my songs for the Vagabundo Barbudo project, and furthermore provides a nice python wrapper for the API. Based on how the site is structured, I found out 2 basic ways of implementing the system:

– Favorites: the user can select specific songs as “favorites”. If the user U1 favorites both songs S1 and S2, then these two songs are first degree connected. If another user, called U2, favorites songs S2 and the new hit S3, now songs S1 and S3 are second degree connected. And so on…

– Followers: the user can also “follow” another user, mainly because they are friends or the first one likes the music of the second one. It is basically a unidirectional relationship: if user U1 is follower of U2, then U2 is followed by U1 (but U2 can, of course, reciprocally become follower of U1). We consider here that, if U1 follows U2, and U2 follows U3, then U1 and U3 are second degree connected.

The idea is that, given these connexion methods, and the desired proximity degree, one can enjoy a playlist composed by music randomly choosed among his friends’ friends. It is probably not the ultimate recommendation system, but at least one can have some fun and discover the musical ecosystem of his closer environment.

Once the songs are chosen, they are streamed from soundcloud into VLC player. From here comes the name of the system: VIRUS (VLC Iterative Recommender Under SoundCloud).

Schema of the VIRUS system

Schema of the VIRUS system

The code is already available in my repo: https://github.com/andresperezlopez/VIRUS

For the future work, it would be nice to have a tool for measuring the distance between two artists, say Vagabundo Barbudo and Justin Bieber.

PD – For the assignment, I also created an AutoTagger trained with the MajorMiner dataset (from MIREX), and applied it to my tracks. It was kind of correct, but far less interesting that the current tags.

Left colums: actual tags.  Right columns: auto-tags classified from the MajorMiner dataset

Left colums: actual tags.
Right columns: auto-tags classified from the MajorMiner dataset

Anuncios
Publicado en sound and music computing | Etiquetado , , , , | Deja un comentario

Ambisonics Encoding

As I commented last time, I am going to use Ambisonics for sound spatialization. I presented  in the last post the basic concepts of Ambisonics. Today I will go a little deeper into encoding.

We can develop the wave equation as a series of spherical harmonics. Since the series is infinitely long, we must truncate it at some point for a practical application. The number of terms used will give us the ambisonics order. For instance, for practical considerations I will use order 3. In following plot we can observe how the harmonics look like, represented in the horizontal plane.

Plot of the spherical harmonics up to 3rd order. Positive regions on blue, negative on red. SN3D coefficients.

Plot of the spherical harmonics up to 3rd order. Positive regions on blue, negative on red. N3D coefficients.

We can do a physical comparision between microphone radiation patterns and spherical harmonics. First row plot is an omnidirectional microphone (receives signal equally from all angles). In second row we have the first order expansion (FOA) – they are figures-of-eight directional microphones, pointing in each one of the three physical dimensions. Above first order, the metaphor with physical elements gets more vague.

There are different coefficients to apply to the radiation patterns. As said in the figure, we are using Normalized 3D coefficients (N3D), which develops directly from the orthonormal basis, but other have been developed and used. See Daniel01, table 3.1 for the different coefficients, and in general chapter 3.1.2 for the encoding.

Using the spherical harmonics, the simplest thing we can do is to encode a punctual sound source. The equations are pretty much similar to the ones from the spherical harmonics, so basically what it will happen is that the harmonics will change dinamically their shapes between their original one and their opposite.

If we forget for now the amplitude/distance issue, this will be the gain information that we will encode in each one of the channels. From up to down and left to right, we have W (zero-th order), Y, Z and X (FOA), V, T, R, S, U (2nd order) and Q, O, M, K, L, N, P (3rd order) channels.

We can follow simulating how will be the reconstruction of the source, after our encoding. Following videos show the general shape for all channels in a given order (i.e. without adding the previous channels).

We can mainly observe three things: first, the total radiation pattern is pointing the sound source with equal gain for all angles. Second, the gain peak increases as the order increases, corresponding in the SN3D normalization with 2m+1 (which is also the number of spherical harmonics for this order). Third, when the order increases, the directivity of the main lobe increases, which means that the encoding is more precise regarding the position of the punctual sound source.

These plots were taking into account only the components for a given order. If we represent the total amount of components up to a given order, as the ambisonics theory develops, we can appreciate better the directivity increase. In this animation, all gain peaks are normalized to 1, in order to focus on the directivity.

Another static representation of the directivity across orders:

Directivity plot from different ambisonics orders, up to m=3.

Directivity plot from different ambisonics orders, up to m=3. Peak gains normalized to 1 for comparison

All plots and animations are created with python matplotlib. As usual, code is available at https://github.com/andresperezlopez/rt-spatialization/

Publicado en rt spatialization | Etiquetado | Deja un comentario

Master Thesis – Real-Time Tools for 3D Audio Spatialization

This is the first article into my research blog. Here I will post (hopefully) in a regular basis with the outcomes of my Master Thesis, which has the provisional title Real-Time Tools for 3D Audio Spatialization.

For the thesis, I will be using the spatial audio system from Barcelona Media, which have up to 24 speakers in a non-regular setup. They have been working on Ambisonics, and I will use mainly this spatialization technic.

Ambisonics (http://en.wikipedia.org/wiki/Ambisonics) is an audio recording and reproduction system developed by M. Gerzon and P. Fellgett in the 1970’s. It is based on the soundwave decomposition in spherical harmonics. Due to several reasons (economical, audio standards, computational power..)  it has been never widely known neither used, but its mathematical basis is very solid, and nowadays it is possible to use it successfully as the spatialization method. The most complete source of information about Ambisonics is the PhD Thesis from J. Daniel, Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia (pitifully only in french). It can be found here: http://pcfarina.eng.unipr.it/Public/phd-thesis/jd-these-original-version.pdf

Représentation de champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia

One of  its main features is that the channel assignment is not depending on the speakers. Traditionally, one would have one channel for speaker; the problem with this paradigm is that audio encoded for one speaker layout can be only perfectly reconstructed by using exactly the same speaker layour. This compatibility problem is widely known, and has bring a number of audio standards (5.1, 7.1, 10.2, etc), which anyway could not compite with the de facto standard (stereo).

Ambisonics, conversely, work in a spatial channel encoding paradigm. In other words, each channel streams information about one spatial direction. Before the speakers, there should be a decoding stage, which would split the audio according to the current speaker layout. Working on this way, we can reproduce spatial audio in any system, independently of its layout.

There are infinite Ambisonics orders, which corresponds to the number of terms used in the spherical harmonic series expansion. Traditionally, the most used is First Order Ambisonics (FOA), which has a direct physical interpretation. In FOA, we encode the audio using 4 channels: one for each spatial axis (X,Y,Z) and one for the pressure (W). This encoding is equivalent to use three figure-of-eight microphones (one for each direction) and a omnidirectional one for W. However, FOA has not a good spatial resolution, an therefore other techniques (such as VBAP) can be used with more accuracy.

However, since computational power allows us, we can extend arbitrarily the order (we will be limited for the number of speakers), increasing the spatial resolution and the overall sensation. For a regular speaker layout, we need at least one speaker per channel for a good soundfield reconstruction.

For a given order n, we will have 2n + 1 different channels (above order 1 we lose the physical relationship with our 3D space), corresponding to the 2n + 1 spherical harmonics with this order. Since they are accumulative, the total amount of harmonics (and channels) for a given order n will be (n+1)**2. 

Here we can observe a nice representation of the spherical harmonics

Spherical harmonics up to 3rd order. From Sarxos, wikipedia, http://en.wikipedia.org/wiki/File:Harmoniki.png

Spherical harmonics up to 3rd order. From Sarxos, wikipedia, http://en.wikipedia.org/wiki/File:Harmoniki.png

 

 

Since we have 24 speakers in the studio, and they are not regularly distributed, we will be able to work up to n=3 (third order ambisonics).

The Ambisonics spatial channel encoding is calculated through the formula found in Gerzon, chapter 3.1.2, pp 150. I coded some python functions which plots the spherical harmonics and the encoding of a point source. They can be found in my thesis git repository https://github.com/andresperezlopez/rt-spatialization

On next days I will extract some animations from these functions and share them here.

Publicado en rt spatialization | Etiquetado | 1 Comentario