This post is just a quick note. I’ll extend on it later.

Imagine you have a data flow graph with nodes A, B, C, D, G. Of these nodes, A and B are a source of stream and G is a sink. Now, say, the static scheduler gives you this iterative list of firings: B, A, C, B, A, G, B, D, D, G, D, A, G, C, B, G

Now, your application is real-time and, as usually, it is call-back based (the hardware awakes a process in a regular basis providing and consuming data).

Data flow scheduling is very interesting so far, but a question arises: In practice how can you use this scheduling from the call-back process?

Actually the obvious solution I can think is not good at all: you can do call-back adaptation of a blocking interface. That implies leaving the call-back process to just do buffering, and have another thread that executes the data-flow. Where source and sink nodes (A, B and G) can be blocking if data is not available. But in most cases minimize buffers on the inputs and ouputs is desired. So you want the scheduler that collects input as regularly as possible. This is an admitted (SDF scheduling paper, section V.C) limitation of the model.

I haven’t found yet any research focusing on that problem (though I admit I still have a bunch of related papers to read). So any feedback on this will be greatly appreciated. However, I have some ideas on how the solution could be, I’ll post about it later.

In CLAM related papers we use to define the synchronous flow as a flow where messages get produced and consumed at predictable —if not fixed— rate. This definition relates very well with my intuitive idea of synchronicity. However, having seen examples of multi-rate data flows and having implemented dynamic schedules, I didn’t feel very comfortable with this definition. The main concern is that node’s firing does not necessarily need to be predictable.

Recently, reading some papers from Edward Lee and others I saw the light:

A synchronous data flow is a graph of synchronous nodes. Where a node is synchronous if we can specify a priori the number of input samples consumed on each input and the number of output samples produced on each output each time the node is fired.

This is a extremely simple property of the graph itself and it is independent of its scheduling strategy. But the point is that it is not so intuitively related with the synchronicity idea. This great paper from Lee and Messerschmitt shows (very formally) how to know if an iterative scheduling exists or not. The condition is simple: the rank of the graph’s incidence matrix must be the number of nodes minus one. Also, if such scheduling exist, there is an algorithm that finds it. The algorithm itself is simple to understand but not so to implement since it involves linear equations solving and exponential search.

To make it clear, an scheduling is a list of executions that gets repeated —being for one or many processors—, and necessarily needs to be iterative since an infinite stream is assumed. Of course, an iteration is a list that may include some or all the nodes many times.