-
FriendFeed's Pseudo-ODBR.J. LorimerFri, Feb 27 2009 @ 11:29 pm
-
RubotoR.J. LorimerThu, Feb 26 2009 @ 4:33 pm
-
Constructor CompletionR.J. LorimerTue, Jan 20 2009 @ 3:44 pm
-
The Call of ChrismathuluR.J. LorimerFri, Nov 7 2008 @ 2:38 pm
-
Getting True Java Classes in JRubyR.J. LorimerThu, Sep 25 2008 @ 6:41 pm
The Fork/Join Concurrency Model
I just got done reading Doug Lea’s white-paper on his adaptation of a fork/join algorithm for Java. This algorithm is being ported for inclusion into Java 7, as part of the java.util.concurrent library.
This type of library is explicitly designed to help developers implement highly-concurrent algorithms, where a divide and conquer approach works well. A classic example of such an algorithm is calculating increasingly large Fibonacci numbers.
In such cases, some calculations are dependent upon others, but a good majority of the calculations can be performed independently (concurrently) given the right code organization and timing management.
What this library does is allow you to code an algorithm around the concept of forking sub-calculations (the recursive Fibonacci calls for example) and then joining the results together into another calculation.
By coding this way, you effectively de-couple the code problem from the number of CPUs used to do the processing. To put this another way, by breaking up your code into a lot of independent calculations, you can have multiple CPUs work on parts at the same time, and then you just need to glue those parts together.
A simple recursive Fibonacci algorithm may look like this:
static int fib(int num) { if(num <= 1) { return num; } return fib(num-1) + fib(num-2); }
If you look at the algorithm above, the calculation of fib(num-1) and fib(num-2) are completely independent of each other, and could be processed at the same time. That is what this library attempts to provide.
This type of solution is meant to explicitly address the need for alternate styles of programming in an upcoming world of highly-multi-core machines. The more CPUs you have, the more you need to think about how you can code as efficiently as possible utilizing as many of those CPUs as possible.
Here is an example directly from his paper (as referenced above) that shows a Fibonacci algorithm. The critical component of this class is the run() method, which is where the actual fork/join algorithm exists.
In the run method above, you will see a dividing line (implemented as n <= threshold) to determine at what point the algorithm should move from using a fork/join algorithm to actually performing the fibonacci calculation sequentially. This is a tuning point, and could be implemented as simply (1) in a more basic implementation. This isn’t done simply because the basic fibonacci calculation below 13 is so fast for a single CPU/Thread to process, it would be less efficient to incur the overhead of the Fork/Join algorithm and in turn the extra Fib objects that would need to be created.
In the case of the numbers above 13 that produce a lot of nodes as they recurse, the code then will call the FJTask method coinvoke which then forks both sub-tree calculations, waits for them to complete (aka joins them), and lastly computes the result.
By organizing the problem this way, the code for the problem domain (how it is solved) is totally isolated from how it is processed. Strictly speaking the above code can be driven by 1 thread (although on most modern systems that would be a mistake), but it could easily be split to run on many more threads as well, as the demands see fit.
Something else intriguing about this type of implementation is that the back-end threading model is self-balancing. As an example, the left-side calculation in the Fibonacci program (fib(n-1)) is exponentially more expensive than the right side (fib(n-2)), so if you fork those two calculations on two threads, one thread will be overburdened as compared to the over. However, as you will see if you read the paper, when one thread completes, it will then ‘assist’ (or steal from) other threads in the pool.
This is a step in the right direction for Java (and all languages). People really need to start thinking this way, and there are a whole lot of Java programmers out there, so this is a natural realm to start converting.
As a final note, I would like to point out that this is exactly what functional programming languages provide, simply by the nature and design of the functional approach to programming. Any program written in a purely functional language is theoretically capable of scaling extremely well when given more CPUs in the right runtime environment, as the individual functional calls are totally isolated from each other, and as such can often be run out-of-order, and as concurrently as the hardware will support. The above code example is really an adaptation of the functional programming concepts into a shared-memory procedural world (and it’s a limited view of it, at that).

Comments
Do other examples exist?
This Fibonacci fork-join is really nice. However, I'm getting a little bit tired of it. It seems that the framework can be used only for this algorithm, because everyone is using it as demo. Are there any realistic usage examples?
Quite old - amazing
When reading the paper I stumbled over the Java 1.2 reference. Seems the paper (from the references section) is from 1999 or 2000. Amazing that it took such a good idea (and easy to implement and use) 8 years to arrive in the JDK.
Thanks for posting, I enjoyed the read,
Peace
-stephan
--
Stephan Schmidt :: stephan@reposita.org
Reposita Open Source - Monitor your software development
http://www.reposita.org
Blog at http://stephan.reposita.org - No signal. No noise.
Already there
You can do this in Java 5 -- see Future and Executor.
Not Quite
Well, no, not really.
While an Executor abstracts the unit of work from how it is executed, it doesn’t provide the ‘coInvoke’ concept - which central to this fork/join model. The
coInvokemethod does a lot in the fork join model. It queues up the sub-unit, and then causes the parent unit of work to wait until both sub-units are done. Future allows you to wait on a child, but it doesn’t apply the recursive queue-ing in the same way.In addition, the
TaskRunnerfor all of this, which is the configurable thread manager differs from anExecutionServicequite a bit by having the individual threads be much smarter - they cross-communicate in this model so no thread is over-loaded, and no thread is starved.That being said, it is a variant concept. Doug Lea has always said that most of his tools are just smarter abstractions of things you can do with simple
synchronizedblocks orLockobjects (the base ‘primitives’ of concurrency as he calls them).