What is the consensus on compile-time evaluation?

constexpr if you come from C++, static expressions in D, comptime in Zig, etc…
pure functions and constants marked for evaluation at compile time. My naive understanding is that those languages implement limited interpreters in their toolchain to evaluate such expressions and embed their results in the executable.

Well, we have a full-fledged interpreter, ocamlrun, that transparently understands the whole language ocamlopt undestands. It seems we’re well-prepared to have a similar feature for code generation.
What we do have now that I imagine closest to that is ppxes, but I think this is more general… If the compiler had comptime abilities it could use it transparently on -O3 builds to reduce the overhead of evaluating certain portions of the executable. For example:

let num = 0xdeadbeef

let msb =
  let rec getmsb pos num =
   if num = 0 then pos else
      getmsb (pos+1) (num lsr 1)
  in
  getmsb ~-1 num

Right now, the compiler won’t evaluate this code and return with an executable that has two constants. But one could imagine a feature where I could either annotate the definition: let[@comp] rec getmsb ... or its application site: (getmsb[@comp]) ~-1 num and have that code run.

Languages like Zig and D go even further than that. For example Zig implements some sort of dependent-typing-like features by allowing types to be comptime values that can be matched on and returned from comptime functions. This also allows it to implement ML-like modules and functors with regular structs and functions.

Thoughts on the whole concept? Has it ever been considered before for ocamlopt? I wonder if it’s something the development team had discussed before.

5 Likes

I am not sure how close this is to what you would like, but here are two things that I think are relevant :
https://oliviernicole.github.io/about_macros.html
http://okmij.org/ftp/ML/MetaOCaml.html

2 Likes

You might want to also check out GitHub - thierry-martinez/metapp: Meta-preprocessor for OCaml

3 Likes

For your particular example, if you compile with Flambda you can annotate the last call with the [@unrolled] attribute and the compiler will simplify it for you:

let num = 0xdeadbeef

let msb =
  let rec getmsb pos num =
   if num = 0 then pos else
      getmsb (pos+1) (num lsr 1)
  in
  (getmsb[@unrolled 200]) ~-1 num
$ ocamlopt -c test.ml -dflambda
End of middle end:
let_symbol (camlTest (Block (tag 0,  3735928559 31)))
End camlTest
2 Likes

that’s an interesting tidbit! [@unrolled] attribute doesn’t seem mentioned in the manual (except briefly under [@local]) so I’m guessing it’s a little new? Or only applicable to +flambda compilers? Anyway it’s a good-to-know!

My interest in comptime evaluation comes from all sorts of things like lookup table generation, and calculating otherwise-magic constants in a way that serves both as documentation and as debuggable code. It’s also helpful for times you don’t want to type things by hand or produce unnecessarily big diffs. I realize my use cases are more-or-less convenience ones, and can probably be fulfilled with PPXes, but I do hear of performance-related use-cases.

Macros and staging/templating systems like the ones mentioned above do fall under comptime evaluation yes, but maybe that’s not the whole picture?

In terms of the compiler’s pipelines, this isn’t strictly true. ocamlrun implements the abstract machine whose instructions are defined in bytecomp/instruct.mli and that language is not understood by ocamlopt. You have to go back up one level to the lambda representation:

dra@thor:~$ ocamlc -dlambda -c msb.ml
(setglobal Msb!
  (let
    (num/80 =[int] 3735928559
     msb/81 =[int]
       (letrec
         (getmsb/82
            (function pos/83[int] num/84[int] : int
              (if (== num/84 0) pos/83
                (apply getmsb/82 (+ pos/83 1) (lsr num/84 1)))))
         (apply getmsb/82 (~ 1) num/80)))
    (makeblock 0 num/80 msb/81)))

vs

dra@thor:~$ ocamlc -dinstr msb.ml
        branch L2
        restart
L1:     grab 1
        acc 1
        push
        const 0
        eqint
        branchifnot L3
        acc 0
        return 2
L3:     const 1
        push
        acc 2
        lsrint
        push
        acc 1
        offsetint 1
        push
        offsetclosure 0
        appterm 2, 4
L2:     const 3735928559
        push
        closurerec 1, 0
        acc 1
        push
        const 1
        negint
        push
        acc 2
        apply 2
        pop 1
        push
        acc 0
        push
        acc 2
        makeblock 2, 0
        pop 2
        setglobal Msb!

Ah so, for a feature like this, we actually need a lambda interpreter instead of the bytecode interpreter… Something like that? And I’m guessing it’s not easy to feed back lambda once we’re down to bc

The main issue about compile-time evaluation in optimising compilers is not how to do it (Flambda can do it just fine if you use the right attributes, but various other parts of the compiler could do it too with only a little work), it’s how to decide when to do it.
Your example shows the problem in a very nice way: replace (pos+1) by pos and you have an infinite loop; but for the compiler the two programs are almost identical. So to make sure the compiler isn’t trying to evaluate at compile-time something that wouldn’t terminate, it would need either to compute a proof of termination (that’s complicated and potentially expensive), or some safeguard that stops evaluating after some amount of work has been done.
The second way is what’s currently implemented in Flambda, and the annotations like [@unrolled] can be used to give hints to the compiler that it might be interesting to evaluate more than the default setting. (To give you an idea, the default settings correspond to [@unrolled 0], and with -O3 you get the equivalent of [@unrolled 1]. Your example needs [@unrolled 33] to get fully evaluated.)

1 Like

Yeah. All the examples I vaguely know of do use some kind of (often user-adjustable) fuel to terminate. Is there a reason why flambda had chosen to limit itself to unrolling annotations though? Or is this just a “for now” thing?

For simplifying function calls, Flambda needs to inline the called function. Whether to inline a function call or not is decided by a number of factors, and in the particular case of recursive functions the main factor is the unrolling “fuel”.
But for non-recursive cases, Flambda can be much more aggressive.
For example, if you manually unroll your example:

let num = 0xdeadbeef

let msb =
  let rec getmsb pos num =
   if num = 0 then pos else
      getmsb (pos+1) (num lsr 1)
  in
  let getmsb pos num =
   if num = 0 then pos else
      getmsb (pos+1) (num lsr 1)
  in
  (* Repeat about 32 times *)
  let getmsb pos num =
   if num = 0 then pos else
      getmsb (pos+1) (num lsr 1)
  in
  getmsb ~-1 num

Then you can get this to compile to a constant with the -O3 flag. If you annotate each getmsb function with [@inline always] then you don’t even need the -O3 flag.

The problem with recursive functions (and the reason why we don’t inline them without the [@unrolled] annotation) is that they’re very common in OCaml, and most of the times the application of a recursive function will not simplify to a constant. So to avoid spending a lot of time trying to simplify recursive function calls that can’t be simplified, the default is to only try recursive evaluation if the user specified it with an annotation.

3 Likes

understandable… I did experiment a little with this unrolled annotation.
To better frame my last question though:

Just unrolled? Or is this just a “for now” thing?

I meant, has there been any discussion within flambda on extending this notion of unroll+simplify to something that more-or-less says “trust me and compute this at compilation time”, with an implicit default fuel… instead of unrolled n where n is previously known by the user? Something that also works with any kind of function?

So that I can, for example, write something like:

let[@comp] comptime_sort = (* sorting implementation *)

let arr = comptime_sort [|1;2;7;4;5;6;0;3|]
let lst = (List.sort[@comp]) Int.compare [1;2;7;4;5;6;0;3]

Whereas if I use the unrolled annotation on this… I get an unrolling, not the result of a computation

There have been various discussions on the subject, of course.

But let’s start with the definitions: what does it mean for a value to be computed at compile time ?
Remember that OCaml is mostly a functional language. In particular in the intermediate representations that are used for optimisations, we use something that’s close to an extension of the lambda-calculus.

So what does it mean for a lambda-term to be evaluated at compile time ? Well, the most obvious answer is to beta-reduce it until we get a value (i.e. not an application). But that’s not always possible: if you get an application of the form x y, where x is a variable on which you do not have information in this scope, you can’t reduce it further. In real OCaml programs, this can happen either if you try to do compile time computations inside functions (let foo f = let y = (f[@comp]) 0 in y) or if you access compilation units for which the .cmx file is absent (in your List.sort example, if you don’t have access to list.cmx you can’t reduce it further). Values that are the result of the call to a C function (or Sys.opaque_identity) also fall broadly into this case. This is not that big a problem: if the compiler can’t reduce an application, it can emit a warning saying that it couldn’t evaluate the expression at compile time and move on.

In the Flambda world, this should be doable through inlining. If we inline all functions calls that occur under a given [@comp] annotation, we could produce something that looks (after simplification) reasonably like what you would expect. In fact, if I compile let lst = List.sort Int.compare [1;2;7;4;5;6;0;3] with ocamlopt -O3 -inline-call-cost 100 -inline-max-unroll 10 I get something equivalent to let lst = [0; 1; 2; 3; 4; 5; 6; 7]. Unfortunately there’s currently no way to set the inlining parameters for a specific part of the code only, and of course knowing which exact parameters to pass is not exactly trivial to find. But it shows that if we wanted to add support for your [@comp] attribute, we could already go a long way with minimal additional effort.

However, if we try the same approach with Array.sort then we quickly hit a wall: loops and operations on mutable structures are never simplified, so even is you inline all function calls you’ll never get a single result.

Simplifying loops is not completely impossible. As a proof of concept, we could rewrite while _test do _body done to let rec loop () = if test then (_body; loop ()) in loop () and reuse the unrolling mechanism. But loops are all about doing side-effects, and that’s where it gets complicated. Tracking the contents of values with mutable fields is very tricky: if you can’t get 100% sure that you know all the places where the value gets modified, you can’t do any meaningful simplifications. So that means that the optimiser needs a precise escape analysis (to track where each mutable value might end up), which tends to be expensive. After that, you need to track the effect of evaluating the expressions during the traversal. In practice, this means that if your symbolic evaluator keeps its environment as a map from variables to symbolic values, instead of simply passing the environment as argument during the traversal of sub-expressions, you have to also return it and be careful to always use the updated version. This also forces you to choose an evaluation order for all expressions that have multiple sub-expressions.

All of that is a lot of work, but it can be useful for other reasons too, so most of the required pieces of code have already been done in one way or the other on our experimental Flambda 2 branch. (The escape analysis part is very incomplete though; for now we only have the code from the main compiler that translates non-escaping references to mutable variables. For the purpose of compile-time evaluation of mutable structures we would need to generalise it to other data types and integrate it in the simplifier.)
But there is one last issue: the array type. It’s a built-in type, but some of the primitives manipulating arrays are actually implemented in C, in the runtime. So to actually simplify code using arrays, we need to hardcode in our simplifier (or symbolic evaluator) the semantics of all meaningful C functions of the runtime. Not impossible, but a lot of code to write and maintain.

To sum up, the feature you propose makes sense from the user’s point of view, and a lot of the code needed to make it work is already there. But to make it work in all cases is still a long way ahead, and we’re not very keen on introducing a feature that only works partially, without a good specification of which cases are handled and which cases are not (cough [@tailcall] cough). Feel free to post a feature request on the main tracker about it though, maybe someone will find the time and motivation to work on it at some point.

9 Likes

Thank you for the comprehensive reply. Really shows the extent to which the implications of this feature reach.
So is that issue with mutation and escaping is probably the reason why e.g. D requires the computations to be pure? I can see it being the case.
The issues you raise with C prims and mutation imply flambda’s evaluation is purely reducing/symbolic, and that it doesn’t use the OCaml runtime. Is my understanding correct? Would that possibly be the path for any future work on comptime evaluation? Or is it at odds with the current system in place?
@dra27 mentioned that “going back up a level [from bytecode]” would be needed for that to work, and as you mentioned, to maintain correctness the escape analysis needs to be worked on more and generalized. So my impression is that those two pieces would be fitted together to get a more general comptime evaluator.

I do recognize my position here though, I have no knowledge of the semantics of the flambda implementation details. So this intuition is merely aiming in the dark.
Instead, I’m interested in hearing what you have to say about the possible future steps towards that goal since it seems to me aligned with what’s already being worked on. And also interested in reading such discussions (are they issues on the flambda repo?)

Very few compilers run the programs they’re compiling, and the OCaml compiler is not an exception. What you’re describing is sometimes called multi-stage compilation, and is often done either at the build system level (think code generators) or the pre-processing level (in OCaml, that would be things like metapp and MetaOCaml).
During the optimisation pass, this isn’t realistic at all. You would need to compile the expression you’re trying to simplify, run the generated code (in a context where you can extract the result), and then feed this result back into the optimiser, converting it into the right representation. And of course it wouldn’t work if the expression you’re evaluating is non-deterministic, or has side-effects, or if you’re in a cross-compilation context.

There are a few public discussions on github (sometimes on the main OCaml repo, sometimes on the Flambda repo), but most of the discussions occur either in person or through private channels. That’s a bit unfortunate for public awareness, but it’s a consequence of how it is being developed (by a team of developers in close contact rather than a distributed group of people from various backgrounds).

If you really want to make progress on this subject, I strongly advise opening an issue on the OCaml tracker. Describe what you’re aiming for precisely, and with a bit of luck that will spawn the discussions that you’d like to read. Not all compiler developers look at this forum or have time to post comments.

2 Likes

Not related to OCaml, but there is this titzer/virgil language which claims to to exactly this. Run arbitrary computations at compile time in interpreter mode.

I’ve played with it when it was still a Google Code project, and it seemed rather nice. The way it solves the termination problems seems to be that is doesn’t. If your compilation does not end it is your problem.

1 Like

Presumably this is to speed up array access in the bytecode interpreter? It would probably make sense to switch it over to code generation in the compiler.