I recently came across a fancy bug in Coq when compiled with the flambda flag, and I am wondering whether people around here could help me understanding what is going on there.
So essentially in Coq we have this JIT-like mechanism called native compilation that compiles a Coq term to OCaml and then links the resulting code for faster evaluation. It uses a lot of unsafe features to do so, because the type systems of the two languages are wildly different. Obviously we deserve what is happening, namely flambda generating segfaulting code. Yet, I do not understand which kind of invariant has been broken, if any. That could also be a bug in flambda! Also, disclaimer: I am not responsible for the native compilation scheme, I just want it to work with flambda, please do not curse me for the code below.
I have narrowed down the bug to this snippet:
type t = t -> t
type accumulator
type atom = int
let accumulate_tag = 0
let accumulate_code (k:accumulator) (x:t) =
let o = Obj.repr k in
let osize = Obj.size o in
let r = Obj.new_block accumulate_tag (osize + 1) in
for i = 0 to osize - 1 do
Obj.set_field r i (Obj.field o i)
done;
Obj.set_field r osize (Obj.repr x);
(Obj.obj r:t)
let rec accumulate (x:t) =
accumulate_code (Obj.magic accumulate) x
let mk_accu_gen rcode (a:atom) =
let r = Obj.new_block 0 3 in
Obj.set_field r 0 (Obj.field (Obj.magic rcode) 0);
Obj.set_field r 1 (Obj.field (Obj.magic rcode) 1);
Obj.set_field r 2 (Obj.magic a);
(Obj.magic r:t)
let mk_cofix_accu =
mk_accu_gen accumulate 0xdead
let v = mk_cofix_accu (Obj.magic 0xbeef : t)
let () =
let obj = Obj.repr v in
assert (Obj.size obj == 4);
let f = Obj.field obj 2 in
assert (Obj.is_int f);
assert ((Obj.magic f : int) == 0xdead)
When compiling without flambda, v := (tag = 0)[code, 1, 0xdead, 0xbeef] while when it’s on I get v := (tag = 0)[code, 1, 0xbeef]. The accumulator structure is essentially a hack to encode a infinitary function in OCaml.
Any thoughts? There is a specific Coq bug report here.
My understanding is that the non-flambda version adds
an implicit “environment” parameter to accumulate
while the flambda version does not. Here is what seems
to be a minimal reproduction case:
let rec f _ _ = Obj.magic f
let () = print_int Obj.(size @@ repr (f ()))
If will print 4 with a 4.06.1 non-lambda compiler, and 2
with a flambda one.
I don’t think that @xclerc’s example is related. That case is flambda removing variables from the closure environment of the closure resulting from partial application.
Your original example is I think due to a change in how the recursive reference to accumulate is being accessed. I think that in non-flambda mode the reference to accumulate inside the definition of accumulate is replaced by the accumulate closure parameter that was implicitly passed to accumulate – which since you’ve been messing around with the closures is not actually a reference to accumulate’s closure in your code. In flambda mode the recursive reference to accumulate is instead being fetched directly from its statically know address – since accumulate is a top-level function we allocate it’s block statically and can access the closure directly.
This is clearly a sound optimisation – the closure passed into a function will always be the closure of that function in regular OCaml code.
Not sure what the best way to fix your code is – but if can avoid playing around too much with closures things should go a little smoother.
It might also be worth noting that since you are not giving these fake closures you’re making the closure tag, they may segfault in -no-naked-pointers mode – which may one day be the default, so you might want to use Obj.closure_tag instead of 0 for their tag.
Which is (of course) consistent with your analysis: in the first
case the closure is retrieved from the additional “environment”
parameter while in the second case there is no such parameter
and the closure is retrieved “directly” as you pointed out.
The same can be observed regarding accumulate in the original
example:
This optimization is indeed triggered; it can be shown by modifying the code
to have:
let rec f _ _ _ _ _ _ = Obj.magic f
let () = print_int Obj.(size @@ repr (f () () () () ()))
Then, you still get 2 with flambda but 8 without it.
I think this is the combination of this optimization and the
presence/absence of the environment that explains that
you get 2 vs 4 in my original example.
Yes, unluckily I am aware of that. I have been working on removing a similar source of naked pointers in the Coq VM, which is essentially a tailored version of the OCaml bytecode VM. Using the closure tag might not be enough here, because it can compete with proper closures and we need to be able to discriminate between both. It might require a complete rework of the native compiler thus. Do you have an idea of how far that one day is?
There is no way something like that will work with flambda or be maintainable. This code is assuming a precise compilation schema for closure representation, and we are not going to provide any way to predict it. What is the aim of that code and was it handwritten ? It feels like this is trying to tie the knot of a recursive function definition. There is certainly some way to represent that decently with an intermediate block that get patched rather than patching the function itself.
Native compilation is implementing normalization by evaluation, so that you need to properly handle neutral terms. This is the goal of the accumulator code that can be seen as some way to implement a weak form of delimited continuations. Essentially, we want a term accu which has infinite arity, i.e. it can be applied to an arbitrarily large applicative context, in a way that allows to retrieve this context when needed. That is, in terms of API:
type accumulator
val accu : accumulator
val apply : accumulator -> 'a -> accumulator
val data : accumulator -> Obj.t array
We also need to handle pattern-maching contexts, and if I understand correctly the translation scheme, this is done by reserving the first constructor of algebraic types and setting the accumulator tag to be 0 so that matching on an accumulator does the right thing w.r.t. NBE.