Wednesday, March 14, 2012

Some yaTS updates

Hello all,
I spent some time to update yaTS (http://code.google.com/p/yats/).
Some of the big items:

Reactive way to yield / wake up threads
Initially I wanted to do something purely distributed like randomly waking up threads when you push a task. Actually, it was pretty bad. You do not really want anything random here. You just want to wake up a thread that for sure is going to be sleeping and you really want to give it something to do. So, basically, the solution is to maintain a *global* bitfield which identifies which threads are sleeping. When you schedule a new task:
  • if the task has no affinity (it is stealable), you push it on your own queue, pick up a thread that is sleeping, you wake it up and you mailbox the ID of your queue such that the thread that just woke up exactly knows where to pick the job.
  • if the task has an affinity, you just push the task in the target thread (the one that matches the affinity ID) and if the thread sleeps, you wake it up.
Of course, there is two or three details to avoid any kind of deadlocks and this go-to-sleep and wake-up operations are more or less serialized. So, go-to-bed and wake-up is no more distributed but I do not think this is important since any application that spends its time yielding and waking up threads has already a problem.

A Task profiler
Just a C++ interface with user-provided callbacks that are triggered on a bunch of internal events (before the run function, after the run function, when the task is done and so on...)

A Bunch of C++ policy-like classes
I added many helper task classes built on top of tasking.hpp. The really cool one is the one that extends the tasks to make them support any number of start and end dependencies. Rather convenient. Good thing is that you can even be a start dependency of another task even if you are already running or even if you are actually done.

More restricted waitForCompletion
Waiting for a task from a task is actually really hard. So hard that my previous implemention was just totally wrong. My idea was to do something like that:

 void Task::waitForCompletion(Task &other) {
while (other.isNotDone()) runAnyOtherTaskThatIsReady();
}


Big problem is that the other task you may run from "runAnyOtherTaskThatIsReady()" can actually itself wait for yourself. So, by recursing, you just introduce a cycle in your DAG and the system is deadlocked. The idea to have something working and easy to use may be to have yieldable tasks and somehow to replace the stack recursion by co-routines that are somehow scheduled in a proper way. Well, I am lazy right now :-)

I therefore removed waitForCompletion. Now, you can wait for a task but only from *outside* the tasking system.
Fortunately, with the relaxed multiple dependency tasks, you can have the same thing as waitForCompletion with no deadlock and with a continuation passing style approach.
Basically, this:

Task *MyTask::run(void) {
run some code A;
waitForCompletion(otherTask);
run some code B;
return NULL;
}


becomes with some c++ lambda horrors:
Task *MyTask::run(void) {
run some code A;
Task *cont = spawn([]() { run some code B; return NULL; });
otherTask->multiStarts(cont); // new flexible end dependencies here
cont->ends(this);
return cont;
}


Not too bad with our favorite I-am-so-evil-that-you-will-shoot-yourself-in-the-head-with-my-pseudo-closure-with-no-garbage-collection-that-makes-you-use-reference-counted-pointers-that-introduce-memory-leaks-with-cycling-references C++ language :-)

Ben

2 comments:

Derek Gerstmann said...

How would you change things if you adopted coroutines? Ie, nested parallelism using tasks for launching groups of work run by coroutines/fibres in dedicated threads? Would this help at all or are groupsets the way to go?

-- dg

bouliiii said...

For the co-routines, I have no precise idea yet. The main use is really to remove this possible deadlock when you want to wait for an arbitrary task. Basically, you only use the main coroutine all the time to run tasks (as it is today, since there is only one coroutine per HW thread). However, when you wait for a task, you run new tasks in separate coroutines: this gives the ability to return to the waiting task without the need to unwind the stack. I am not even sure that works but this is mostly the idea.