Blog post: A new general computation model for Dune


A blog post about the new computation model that Dune is built on.

12 Likes

Why is the memoize call explicit? One could naively think that any 'b Fiber.t represents an incremental 'b value, that is cached/memoized (… Fiber.t would be a parallel+incremental effect type). I suppose that you get a benefit from having the programmer control where the memoization points are placed?

My understanding is that you need to give the input/output ops to specify explicitly how to hash/serialize your values, so that memoization works across calls to dune.

Very cool! Do you think such a memoization system might have uses other than inside a build system?

Also I see you mentioned Build systems à la carte; where would Dune be in the scheduling algorithm/rebuilding strategy table?

@gasche, there are several reasons. The first reason is that memoization is not free and adds a bit of overhead. In particular, you need to use more memory in order to track the dependencies and cached result and you also spend time checking whether previous results are still valid. If you memoize everything, then the overhead of the memoization system is likely to become too big and will slow down everything.

By controlling explicitly what is memoized, we have better control on the ratio between the overhead of memoization and the amount of things that are recomputed when certain things change. In particular, we just need to make sure that the amount of recomputation needed for common modificatoins is small enough to not be noticeable. It is not necessary to go any further than that.

The second reason is indeed so that we can make cached results persist between calls to dune, as @c-cube mentioned. Plus we also have operations to parse/print the inputs to provide additional debugging.

The third reason is a bit less obvious. In order to recompute as little as possible, it is important to consider as few dependencies as possible. If we allow to memoize things at any point, then we might capture computed values inside closures, which means that the memoization system has no choice but to consider all the transitive dependencies. In systems such as skip where memoization is part of the language, it is probably possible to be clever and detect what is being captured, however we can’t do that with a normal library. To avoid this problem in Dune, we enforce that only top-level functions can be memoized. i.e. once the build starts it is not possible to memoize any new function. Programmers have to manually eta expand closures they would want to memoize, explicitely specifying the inputs of the function to the system at the same time.

2 Likes

@steinuil, I’ll need to re-read the paper, but I believe Dune is currently a suspending with verifying traces build system. However, we started experimenting with a shared artifact cache for Dune, which means that it would become a suspending with constructive traces build system. The initial results on this front are very promosing BTW.

2 Likes

I suppose yes. I don’t have specific applications in mind, but it is quite a general system.

1 Like

If you make the memoization implicit in the Fiber.t monad you also break the monad laws in a meaningful way. For example,

let a = ... in
let b = map a f in
let c = map b g in
...

is not equivalent to:

let a = ... in
let c = map a (fun x -> g (f x)) in
...

because in the first the call to f x is memoized and in the second it is not.

This prevents a number of important optimisations. For example, you can no longer treat:

map (both a (both b (both c (both d e)))) f

as:

map5 a b c d e f

since they would have different memoization properties. This essentially the same reason why using let%bind/let* syntax is not optimal for async.

In practice, between the loss of optimisations and the tendency to produce much more fine-grained memoization than is efficient, this style of monad has noticeably worse performance than one where the memoization is explicit.

@lpw25: isn’t this is a common consequence of having non-tracked
(unsafe) effects in monadic programs? As soon as your monad may
duplicate, erase or even reorder computations, you can observe
law-breaking behaviors if the programs building monadic values are not
pure.

In this case this is not even a monadic law that is being broken, but
functoriality. It is well-understood that, for example,

List.map f >> List.map g
List.map (f >> g)

are not equivalent in OCaml for arbitrary f, g, but I’m not sure
this is an argument against using lists to represent non-determinism.

I’m not trying to contradict what you say, obviously you have experience with incremental-computation monads, but genuinely curious. Why is the interaction between untracked effects and memoization a “meaningful” issue that prevents optimizations?

I always struggle to articulate the issue here, and I don’t think I choose a good example of it.

I think there are a few reasons that this case is worse than the list case:

a) Memoization doesn’t just change the behaviour for the untracked effects, it also often changes the behaviour for the effects which are being tracked by the monad itself. For example, if foo and bar are two of the tracked effects:

foo >>= fun x ->
bar (x mod 5)

and

foo >>= fun x ->
return (x mod 5) >>= fun y ->
bar y

can disagree in whether the bar effect gets re-executed or not when the value of x has changed but the value of x mod 5 has remained the same.

b) Unlike the list case, here it is the monadic primitives themselves that are performing untracked side effects. Even in a case like async, where there is sharing but no cut-offs, you get that:

let x = read_file () in
x >>= fun file1 ->
x >>= fun file2 ->
file1, file2

is different from:

read_file () >>= fun file1 ->
read_file () >>= fun file2 ->
file1, file2

Essentially, you can no longer think of the monad as being a pure representation of a computation that is then executed.

c) The reason that we have sharing or memoization in these monads is that we care about those things. Since it is a side-effect that we care about and wish to reason about, the monad should be tracking it just like the other effects in the monad. When you make this explicit it becomes much clearer that we are drastically breaking the monad laws, since our structural operations (map, bind, etc.) are performing effects.

d) Monads with sharing do not seem to be naturally modeled using algebraic effects, which implies that they are in some way non-algebraic. I’ve not got a clear handle on this issue but it definitely bothers me.

Separately from that I would also say that I do consider the issues with list as an argument for not using list directly to represent non-determinism. Were I making a library for non-determinism I would probably lean towards something like unit -> 'a list or a free monad instead.

I feel like these arguments point towards some deeper issue that I don’t have a clean explanation for, but they are enough to convince me that it is not the right approach. Especially combined with the improved performance and reasoning you can get by making memoization/sharing an explicitly tracked effect within the monad.

1 Like

With 2.8.5 I still see ~1.3s until dune figures out there’s nothing to do. Is this delay something that will eventually go away, using the memoization technique described in this post?

(I don’t know if 2.8.5 is too old. And running perf stat indicates a lot of time spent in GC and hashing strings, at least at my first reading of the perf report. For comparison, counting words in all the source files with wc takes 0.1s. Finally, installing inotify is not an option for me, sadly.)

This delay sounds a bit unusual. Do you have large files lying around in your repository? They could cause dune to take longer than expected due to the extra hashing and copying (even if the files are not used in the build itself).

Otherwise I would suggest you open an issue at Issues · ocaml/dune · GitHub with the details of your setup.

Cheers,
Nicolas

What do you mean by “large”? The largest file in the directory where I run dune is an .ml file of about ~250KiB.

I made a recording on my laptop (different than the machine where I said ~1.3s):

  • dune build takes >1s
  • find -type f | xargs md5sum > /dev/null takes <0.1s

Is this within normal, or would you say the gap is big enough to warrant filing an issue?

No, I meant many dozens or hundreds of megabytes. The delay you are seeing may be expected, depending on the size of your codebase. For example, dune build takes about 2s on a fully built copy of dune's codebase on my machine. So 1.3s sounds like something plausible. I had misread the time you had posted in your original message which is why I mentioned it was suspect. Apologies!

Cheers,
Nicolas

My understanding is that the implementation of watch mode in dune 2.x is notified of a change, but then needs to throws away all it knows about the rules. Rules have to be reevaluated, which should explain the delay that you’re seeing.

The plan for dune 3 (also discussed here) includes a “fast polling mode” which uses the memoization API to recompute just what is necessary.

The foundations are here and already work, but this is still incompatible with most dune features (dune features have to be written with the memoization API in mind to work, and some features predate Memo). If all goes well you should be able to get sub-300ms feedback in watch mode in dune 3 even in large repositories.