3D audio made with Clam

January 25, 2008

While it is true that the clam-devel mailing-list an irc channel have been a little quiet recently –specially compared with the summer period (well it was called “summer of code” for a good reason!–, this doesn’t mean that we recently had a low development activity. (Being an open-source project the commits say it all)

The quietness is related to David and me being now involved with the acoustics group of the Fundació Barcelona Media, where we work in a more traditional –and so less distributed– fashion collaborating with people who actually sit together. Further, I enjoy very much working with such skilled and interdisciplinary team (half are physicists and half computer scientists), and also assessing that Clam is very useful in these 3D-audio projects. These latest developments on 3D audio rendering where mostly driven, by the IP-RACINE European project aiming to enhance the digital cinema.

The kind of development we do in Clam also changed since last summer. Instead of improving the general infrastructure (for example the multi-rate data-flow system or the NetworkEditor) or improving the existing signal processing algorithms, what we’ve done is… writing plugins. Among many other things the new plugins feature a new lightweight spectrum and fft, and efficient low-latency convolutions.

And this feels good. Not only because the code-compile cycle is sooo fast, but because it means that the framework infrastructure is sufficiently mature and its extension mechanisms are very useful in practice. Further, rewriting the core spectral processing classes allowed us to do a lot of simplifications in the new code and its dependencies. Therefore, the new plugins only depends on the infrastructure, which I’d dare to say is the more polished part of Clam.

And now that IP-RACINE final project demos have been successfully passed, it is a great time to show some results here.

Flamencos in a virtual loft

Download and watch the video in the preferred format:

demo_ipracine_flamencos-small.jpg

Listen to it carefully through the headphones (yes, it will only work with headphones!) You should be able to hear as if you were actually moving in the scene, identifying the direction and distance of each source. It is not made by just automating panning and volumes: but modeling the room so it takes into account how the sound rebounds into all the surfaces of the room. This is done with ray-tracing and impulse-responses techniques.

This stereo version has been made using 10 HRTF filters. However, our main target exhibition set up was 5.0 surround, which gives a better immersive sensation than the stereo version. So, try it if you have a surround equipment around:

Credits: Images rendered by Brainstorm Multimedia and audio rendered by Barcelona Media. An music performed by “Artelotú”

Well, the flamenco musicians in the video should be real actors. Ah! Wouldn’t have been nice?

What was planned

The IP-Racine final testbed was all about integration work-flows among different technological partners. All the audio work-flow is very well explained in this video (Toni Mateos speaking, and briefly featuring me playing with NetworkEditor.)

So, one of the project outcomes was this augmented reality flamencos video in a high-definition digital cinema format. To that end a chroma set was set up (as shows the picture below), and it was to be shoot with a hi-end prototype video camera with position and zoom tracking. The tracking meta-data stream fed both the video and audio rendering, which took place in real-time — all quite impressive!

flamencos_croma-small.jpg
The shouting of the flameco group “Artelotú” in a chroma set

Unfortunately, at the very last moment a little demon jumped in: the electric power got unstable for moment and some integrated circuits of the hi-end camera literally burned.

That’s why the flamencos are motionless pictures. Also, in absence of a camera with position tracking mechanism we choose to freely define the listener path with a 3D modelling tool.

How we did it

In our approach, a database of pressure and velocities impulse-responses (IRs) is computed offline for each (architectural) environment using physically based ray-tracing techniques. During playback, the real-time system retrieves IRs corresponding to the sources and target positions, performs a low-latency partitioned convolution and smoothes IR transitions with cross-fades. Finally, the system is flexible enough to decode to any surround exhibition setup.

complete-surround-with-crossfade.pngbig_surround_network.png

The audio rendering (both real-time and offline) is done with Clam, while the offline IR calculation and 3D navigation are done with other tools.

The big thanks

This work is a collaborative effort, so I’d like to mention all the FBM acoustics/audio group: Toni Mateos, Adan Garriga, Jaume Durany, Jordi Arques, Carles Spa, David García and Pau Arumí. And of course we are thankful to whoever has contributed to Clam.

And last but not least, we’d like to thank “Artelotú” to the flamenco group that put the duende in such a technical demo.

Lessons for Clam

To conclude, this is my quick list of lessons learnt during the realization of this project using Clam.

  • The highly modular and flexible approach of Clam was very suited for this kind of research-while-developing. The multi-rate capability and data type plugins, where specially relevant.
  • The data-flow and visual infrastructure is sufficiently mature.
  • Prototyping and visual feedback is very important while developing new components. The NetworkEditor data monitors and controls were the most valuable debugging aids.
  • Everybody seems to like plugins!
Advertisements

When upgraded Ubuntu from Feisty to Gusty, the newer freebob audio-firewire driver broke the support of Focusrite Saffire Pro. I have not tested the non-pro versions (Saffire and Saffire LE), but I guess it applies also to them.

Freebob/ffado main developer quickly identified the problem and proposed a provisional patch, which is what I’m gonna explain in this post. For a more definitive solution we’ll probably have to wait to the next ffado release. Hopefully soon.

First, the symtoms: This is how Gutsy freebob complains when starting jack with a Saffire Pro.

$ jackd -d freebob
jackd 0.103.0
Copyright 2001-2005 Paul Davis and others.
jackd comes with ABSOLUTELY NO WARRANTY
This is free software, and you are welcome to redistribute it
under certain conditions; see the file COPYING for details
JACK compiled with System V SHM support.
loading driver ..
Freebob using Firewire port 0, node -1
unexpected response received (0x9)
Error (bebob_light/bebob_light_avdevice.cpp)[1679] setSamplingFrequencyPlug: setSampleRate: Could not set sample rate 48000 to IsoStreamInput plug 0
Error (bebob_light/bebob_light_avdevice.cpp)[1696] setSamplingFrequency: setSampleRate: Setting sample rate failed
FreeBoB ERR: FREEBOB: Error creating virtual device
cannot load driver module freebob

The problem is related to the sample rate interface. The quick solution is not using that interface. Just a matter of commenting out a piece of code.

Update 9th March:
Pieter Palmer makes me notice that from version 1.0.9 there is a ./configure switch that does what the below patch does. So you can safely skip the part of this blog about downloading 1.0.7 version and applying the patch.

Instead you should download the svn trunk

svn co https://freebob.svn.sourceforge.net/svnroot/freebob/trunk/libfreebob libfreebob

and then use –disable-samplerate at the ./configure step.

Begin deprecated:

Apply the following patch. If you don’t know how to do it, follow these steps: copy-paste the following patch into a file (i.e. /tmp/saffire.patch).


Index: src/libfreebobstreaming/freebob_streaming.c
===================================================================
--- src/libfreebobstreaming/freebob_streaming.c	(revision 449)
+++ src/libfreebobstreaming/freebob_streaming.c	(working copy)
@@ -154,7 +154,7 @@
 	 * This should be done after device discovery
	 * but before reading the bus description as the device capabilities can change
	 */
-
+#if 0 //disabled for Focusrite Saffire
    if(options.node_id > -1) {
        if (freebob_set_samplerate(dev->fb_handle, options.node_id, options.sample_rate) != 0) {
            freebob_destroy_handle(dev->fb_handle);
@@ -178,8 +178,8 @@
			return NULL;
		}
	    }
-
	}
+#endif

	/* Read the connection specification
 	 */

Then change dir to the checked out repository and apply the patch:

cd libfreebob-1.0.7
patch -p0 < /tmp/saffire.patch

End deprecated:

Now, the building phase. Get the build dependencies.

sudo apt-get build-dep libfreebob0

There is a problem here. For some reason libtool (or some package that includes it) is missing in the debian/control file. This seems to me a packaging bug that makes autoreconf give errors like this one:

possibly undefined macro: AC_ENABLE_STATIC

So, install libtool:

sudo apt-get install libtool

And build:

autoreconf -f -i -s 
./configure
make
sudo make install

To make sure that jack will take the new libfreebob (in /usr/local) we will hide the current freebob libs to jackd. I mean the ones installed by the debian package.

cd /usr/lib
sudo mkdir freebob_hidden
sudo mv libfreebob.* freebob_hidden/

That’s all.
As always check that raw1394 module is ready. Else do this:

sudo modprobe raw1394
sudo chmod a+rw /dev/raw1394

Now jack should work without complaining:

 jackd -d freebob

How to compile freebob with optimizations

CFLAGS="-march=core2" ./configure --enable-optimize --enable-mmx --enable-sse

Or use this for 64bits enabled cpus:

CFLAGS="-march=nocona" ./configure --enable-optimize --enable-mmx --enable-sse

References:

I’m very thankful to Pieter Palmer for the quick help at #ffado irc channel.

My next posts will talk about the reason I needed audio-firewire in linux. It was related to a real-time 3D audio exhibition developed with CLAM. It just happened yesterday and I’m very happy that it all worked really well.

As the other how-tos of this serie, this is also a side-effect of the project I’ve been working recently.

Libsndfile is a very popular library for reading and writing lossless audio files written by Erik de Castro Lopo. We use it in Clam and I’ve use it in other small applications.

This time I wanted to use the C++ wrapper (sndfile.hh header) added recently and, since I couldn’t find an example of use, well, time to post mine here.

I like a lot better the C++ api than the C one. See also the C sndfile API documentation.

#include
#include
#include
int main()
{
const int format=SF_FORMAT_WAV | SF_FORMAT_PCM_16;
// const int format=SF_FORMAT_WAV | SF_FORMAT_FLOAT;
const int channels=1;
const int sampleRate=48000;
const char* outfilename=”foo.wav”;

SndfileHandle outfile(outfilename, SFM_WRITE, format, channels, sampleRate);
if (not outfile) return -1;

// prepare a 3 seconds buffer and write it
const int size = sampleRate*3;
float sample[size];
float current=0.;
for (int i=0; iYou’ll find a complete reference on the available formats on the sndfile API doc. But this are typical subformats of the Wav format. As in the example above, put them after the SF_FORMAT_WAV | portion:

Signed 16, 24, 32 bit data:

SF_FORMAT_PCM_16
SF_FORMAT_PCM_24
SF_FORMAT_PCM_32

Float 32, 64 bit data:

SF_FORMAT_FLOAT 
SF_FORMAT_DOUBLE

I’ve been a long time longing for a N700 or N800 so this short message that just popped into my inbox made my day ;)

Assumpte: N810 maemo submission accepted
Data: Fri, 9 Nov 2007 18:17:31 +0200 (EET) (17:17 CET)

Congratulations! You have been accepted to the N810 maemo
device program. We will send your discount and instructions
as soon as the device is available in your selected shop (soon).

maemo team – http://maemo.org

The N810 maemo device program aims to offering a low price for the new Nokia N810 Internet Tablet (99€) to the active contributors of the maemo community, open source programmers, designers, bloggers and the like.
I’m eager to have it into my hands. We’ll see how hard it is to port Clam and other Linux audio apps to it.

Since the change of SVN for CVS in Clam (a year ago aprox) we do not explicitly tag the releases and that’s fine because SVN revision numbers are global. On the other hand, SVN do can create tags but it is dangerous because a tag is exactly the same thing as a branch. So SVN doesn’t prevent to commit to a tag.

Our tagging approach is very simple and proved useful: just write the revision number of each release in the CHANGES files. This simplifies the release process and also the way to make diffs –since you always use the same syntax.

Now an example: Let’s say we want to see changes in SMSTranspose files from last stable release (1.1.0):

1) Look for the svn revision corresponding to a stable version in the CHANGES file


NetworkEditor$ head CHANGES

2007-??-?? NetworkEditor 1.1.1 SVN $Revision: 10220 $
''
*

2007-06-08 NetworkEditor 1.1.0
'More eye-candy, please'
* SVN Revision: 10216
* Using CLAM 1.1.0
* New examples
* genderChange: fully working now and with an interface

So version 1.1.0 is revision 10216.
By the way, you maybe are curious on this $Revision: 10220 $ part of the first line. This is a SVN variable (or pattern). Each time you commit this CHANGE file the number gets updated to the current revision. That means that we actually never write the revision numbers in CHANGES files, we only have to remove the “$” when we decide to tag the release.

2) Now diff the files of interest on that version and head


NetworkEditor$ svn diff -r 10216:HEAD src/processing/SMSTranspose.*xx
Index: src/processing/SMSTranspose.cxx
===================================================================
--- src/processing/SMSTranspose.cxx (revision 10216)
+++ src/processing/SMSTranspose.cxx (revision 10281)
@@ -20,11 +20,15 @@
*/
#include "SMSTranspose.hxx"
-#include
+#include
#include
...

Last, a quick tip for the gvim users: pipe the diff result to gvim using -R – options:
$ svn diff | gvim -R -
And take advantage of vim syntax highlighting and quick navigation.

This post is just a quick note. I’ll extend on it later.

Imagine you have a data flow graph with nodes A, B, C, D, G. Of these nodes, A and B are a source of stream and G is a sink. Now, say, the static scheduler gives you this iterative list of firings: B, A, C, B, A, G, B, D, D, G, D, A, G, C, B, G

Now, your application is real-time and, as usually, it is call-back based (the hardware awakes a process in a regular basis providing and consuming data).

Data flow scheduling is very interesting so far, but a question arises: In practice how can you use this scheduling from the call-back process?

Actually the obvious solution I can think is not good at all: you can do call-back adaptation of a blocking interface. That implies leaving the call-back process to just do buffering, and have another thread that executes the data-flow. Where source and sink nodes (A, B and G) can be blocking if data is not available. But in most cases minimize buffers on the inputs and ouputs is desired. So you want the scheduler that collects input as regularly as possible. This is an admitted (SDF scheduling paper, section V.C) limitation of the model.

I haven’t found yet any research focusing on that problem (though I admit I still have a bunch of related papers to read). So any feedback on this will be greatly appreciated. However, I have some ideas on how the solution could be, I’ll post about it later.

In CLAM related papers we use to define the synchronous flow as a flow where messages get produced and consumed at predictable —if not fixed— rate. This definition relates very well with my intuitive idea of synchronicity. However, having seen examples of multi-rate data flows and having implemented dynamic schedules, I didn’t feel very comfortable with this definition. The main concern is that node’s firing does not necessarily need to be predictable.

Recently, reading some papers from Edward Lee and others I saw the light:

A synchronous data flow is a graph of synchronous nodes. Where a node is synchronous if we can specify a priori the number of input samples consumed on each input and the number of output samples produced on each output each time the node is fired.

This is a extremely simple property of the graph itself and it is independent of its scheduling strategy. But the point is that it is not so intuitively related with the synchronicity idea. This great paper from Lee and Messerschmitt shows (very formally) how to know if an iterative scheduling exists or not. The condition is simple: the rank of the graph’s incidence matrix must be the number of nodes minus one. Also, if such scheduling exist, there is an algorithm that finds it. The algorithm itself is simple to understand but not so to implement since it involves linear equations solving and exponential search.

To make it clear, an scheduling is a list of executions that gets repeated —being for one or many processors—, and necessarily needs to be iterative since an infinite stream is assumed. Of course, an iteration is a list that may include some or all the nodes many times.