Currently, when one thread blocks on another, we clone the
stack from that task. This results in a lot of clones, but it also
means that we can't mark all the frames involved in a cycle atomically.
Instead, when we propagate information between threads, we also
propagate the participants of the cycle and so forth.
This branch *moves* the stack into the runtime while a thread
is blocked, and then moves it back out when the thread resumes.
This permits the runtime to mark all the cycle participants at once.
It also avoids cloning.
Instead of creating a future for each edge
in the graph, we now have all dependent queries
block in the runtime and wake up whenever results
are published to see if their results are ready.
We could certainly allocate a CondVar for each dependent
query if we found that spurious wakeups were a problem.
I consider this highly unlikely in practice.