3D audio made with Clam

January 25, 2008

While it is true that the clam-devel mailing-list an irc channel have been a little quiet recently –specially compared with the summer period (well it was called “summer of code” for a good reason!–, this doesn’t mean that we recently had a low development activity. (Being an open-source project the commits say it all)

The quietness is related to David and me being now involved with the acoustics group of the Fundació Barcelona Media, where we work in a more traditional –and so less distributed– fashion collaborating with people who actually sit together. Further, I enjoy very much working with such skilled and interdisciplinary team (half are physicists and half computer scientists), and also assessing that Clam is very useful in these 3D-audio projects. These latest developments on 3D audio rendering where mostly driven, by the IP-RACINE European project aiming to enhance the digital cinema.

The kind of development we do in Clam also changed since last summer. Instead of improving the general infrastructure (for example the multi-rate data-flow system or the NetworkEditor) or improving the existing signal processing algorithms, what we’ve done is… writing plugins. Among many other things the new plugins feature a new lightweight spectrum and fft, and efficient low-latency convolutions.

And this feels good. Not only because the code-compile cycle is sooo fast, but because it means that the framework infrastructure is sufficiently mature and its extension mechanisms are very useful in practice. Further, rewriting the core spectral processing classes allowed us to do a lot of simplifications in the new code and its dependencies. Therefore, the new plugins only depends on the infrastructure, which I’d dare to say is the more polished part of Clam.

And now that IP-RACINE final project demos have been successfully passed, it is a great time to show some results here.

Flamencos in a virtual loft

Download and watch the video in the preferred format:


Listen to it carefully through the headphones (yes, it will only work with headphones!) You should be able to hear as if you were actually moving in the scene, identifying the direction and distance of each source. It is not made by just automating panning and volumes: but modeling the room so it takes into account how the sound rebounds into all the surfaces of the room. This is done with ray-tracing and impulse-responses techniques.

This stereo version has been made using 10 HRTF filters. However, our main target exhibition set up was 5.0 surround, which gives a better immersive sensation than the stereo version. So, try it if you have a surround equipment around:

Credits: Images rendered by Brainstorm Multimedia and audio rendered by Barcelona Media. An music performed by “Artelotú”

Well, the flamenco musicians in the video should be real actors. Ah! Wouldn’t have been nice?

What was planned

The IP-Racine final testbed was all about integration work-flows among different technological partners. All the audio work-flow is very well explained in this video (Toni Mateos speaking, and briefly featuring me playing with NetworkEditor.)

So, one of the project outcomes was this augmented reality flamencos video in a high-definition digital cinema format. To that end a chroma set was set up (as shows the picture below), and it was to be shoot with a hi-end prototype video camera with position and zoom tracking. The tracking meta-data stream fed both the video and audio rendering, which took place in real-time — all quite impressive!

The shouting of the flameco group “Artelotú” in a chroma set

Unfortunately, at the very last moment a little demon jumped in: the electric power got unstable for moment and some integrated circuits of the hi-end camera literally burned.

That’s why the flamencos are motionless pictures. Also, in absence of a camera with position tracking mechanism we choose to freely define the listener path with a 3D modelling tool.

How we did it

In our approach, a database of pressure and velocities impulse-responses (IRs) is computed offline for each (architectural) environment using physically based ray-tracing techniques. During playback, the real-time system retrieves IRs corresponding to the sources and target positions, performs a low-latency partitioned convolution and smoothes IR transitions with cross-fades. Finally, the system is flexible enough to decode to any surround exhibition setup.


The audio rendering (both real-time and offline) is done with Clam, while the offline IR calculation and 3D navigation are done with other tools.

The big thanks

This work is a collaborative effort, so I’d like to mention all the FBM acoustics/audio group: Toni Mateos, Adan Garriga, Jaume Durany, Jordi Arques, Carles Spa, David García and Pau Arumí. And of course we are thankful to whoever has contributed to Clam.

And last but not least, we’d like to thank “Artelotú” to the flamenco group that put the duende in such a technical demo.

Lessons for Clam

To conclude, this is my quick list of lessons learnt during the realization of this project using Clam.

  • The highly modular and flexible approach of Clam was very suited for this kind of research-while-developing. The multi-rate capability and data type plugins, where specially relevant.
  • The data-flow and visual infrastructure is sufficiently mature.
  • Prototyping and visual feedback is very important while developing new components. The NetworkEditor data monitors and controls were the most valuable debugging aids.
  • Everybody seems to like plugins!