Multicore, Async, and Lwt

I am curious, when OCaml multicore is finished and merged mainstream, will programs written with Async and Lwt (or similar) libraries become multithreaded automatically? Will it require major changes in Lwt and Async libraries? Or is it totally impossible and will require rewriting programs themselves?

It will require rewriting programs. Right now, Async programs at least rely on the fact that they can mutate state in a region between binds without fear of interruption. This would simply not be true if the Async scheduler automatically just scheduled jobs on different cores.

We’ve thought a bit about this; maybe you’d have one Async scheduler per core, and have ways of shipping jobs between them explicitly? But no auto-parallelization is likely.

y

2 Likes

We’re currently designing an entirely new multicore IO library based on effects, that should support high-performance direct-style I/O. I anticipate (subject to the emerging design) that there will be some manual migration path between this library and existing monadic concurrency libraries. But nothing that is automatic, as it requires careful understanding of the underlying concurrency semantics of both the new implementation and the existing libraries.

20 Likes

I can probably write an auto parallelizing compiler for this. At least, it’d be nice to do some basic profiling to find the hot blocks, and try to unroll loops, etc. Like some basic passes of an auto parallelizing compiler. The hard part is the memory distribution stuff, that’s a little difficult, I didn’t have a great solution to that. But it can probably be solved by a combinatorial optimization model as usual.

I can probably write an auto parallelizing compiler for this.

:laughing: Famous last words.

4 Likes

Haha yeah, I was thinking I should have said I wouldn’t do it for free, though. :stuck_out_tongue: :joy:

More seriously, I think the advances are really exciting here, I would love to try an explicit parallel programming alternative to Haskell for multicore programming. It’s unfortunate that some semantics will be broken but it sounds like that’s a necessary evil. Multicore reagents, algebraic effects, concurrency are all state of the art approaches to functional explicit multicore programming. I was worried how shared memory would be used, but now that I can see it takes very short code to write common multi-threaded programming idioms, I’m quite pleased. It’s a pretty comprehensive solution, possibly better than typical imperative API’s, because this is just awesomely succinct.

1 Like

Pardon the necromancing of this old thread.

As per @yminsky, I understand why Async programs would require rewriting to take advantage of parallelization, to maintain its guarantee that it won’t interrupt between binds. But Lwt doesn’t provide this guarantee, so, shouldn’t it be possible to simply re-base the Lwt library on the parallel threading Domainslib (or whatever) and thusly sprinkle free parallelization pixie dust on existing Lwt programs?

I’d suspect that spawning a new fiber for every lwt bind (or equivalent), which could then potentially execute on another domain, would be too expensive, but I might be wrong.

Lwt is the same way. Since lwt has no preemptive multithreading available, there’s no way to stop a computation from happening and switching to another thread unless the thread happens to be calling an lwt function of some sort (usually bind). This is a very good thing. If you had to handle preemptive multithreading, writing lwt and async code would be extremely error prone and involve locks everywhere for thread safety.

One difference between async and lwt is that async guarantees thread fairness - each thread gets some time to run - whereas with lwt, if a computing thread can keep going, it always will. Only IO or explicit yields will cause a thread to yield to other threads. This trades latency for throughput.

@bluddy I call your attention to @rgrinberg’s article on the differences between them, here http://rgrinberg.com/posts/abandoning-async/

Quote

async >>= f schedules the computation f using the value of async once it’s determined. However, there’s a slight twist in each library.

  • Lwt - if async is already determined when binding then f will run instantly. In effect, Lwt is always “eager” to execute as much as it can.
  • Async - attempts to help reasoning about code using the invariant “code between binds cannot be interrupted”. For example:
m >>= fun x ->
...
>>= fun y ->

You are guaranteed not to have scheduler context switch to another job in the … The supposed benefits of this is the easier reasoning about race conditions. YMMV.

This suggests, to me, that Lwt does not provide this invariant.

Briefly, no. That won’t work at all. A fundamental assumption of cooperative threading is that there is sequential execution of OCaml code between scheduler switches. It doesn’t matter if it is Async or Lwt.

We’re exploring some options to add support to Lwt and Async in multicore, but it will require adapting code. Luckily, it looks likely that this can happen gradually. I’ll post an update when we have something working; I have nothing more to say here until then.

2 Likes

To clarify, Lwt also guarantees that code between binds isn’t interrupted. Async on the other hand also guarantees interruption on binds. Lwt does not make such a guarantee.

I don’t technically see why it’s a fundamental assumption that a cooperative threading library can’t pre-empt threads if your environment has pre-emptive threads available and your cooperative threading library uses them, but I’m happy to be set straight that Lwt doesn’t implicitly interrupt them. Makes migrating an Async project to Lwt (or vice versa) even easier :+1:

Technically there may not be a reason but most cooperative threading code is written such that it can only safely be executed in a non-preemptive environment.

I have my own little cooperative threading library I’ve written that will be interesting to adapt for multiple threads. It’s designed such that the user can spin up multiple schedulers and all work executed in a scheduler would be executed in the same thread of execution and then probably use some kind of thread-safe message passing library to send data across thread boundaries. But I haven’t implemented the thread portion yet.

Maybe I misunderstand you, but how would Async do this? There is no preemption so Async can’t “pause” your Deferred in between binds. Or do you mean that Async will reschedule at binds?

Maybe I misunderstand you, but how would Async do this? There is no preemption so Async can’t “pause” your Deferred in between binds. Or do you mean that Async will reschedule at binds?

I think the OP is indeed saying that Async will reschedule at binds. This was also my understanding of how it works.

1 Like