Understanding the soundness of an flambda optimization

I recently came across a fancy bug in Coq when compiled with the flambda flag, and I am wondering whether people around here could help me understanding what is going on there.

So essentially in Coq we have this JIT-like mechanism called native compilation that compiles a Coq term to OCaml and then links the resulting code for faster evaluation. It uses a lot of unsafe features to do so, because the type systems of the two languages are wildly different. Obviously we deserve what is happening, namely flambda generating segfaulting code. Yet, I do not understand which kind of invariant has been broken, if any. That could also be a bug in flambda! Also, disclaimer: I am not responsible for the native compilation scheme, I just want it to work with flambda, please do not curse me for the code below.

I have narrowed down the bug to this snippet:

type t = t -> t
    
type accumulator

type atom = int 

let accumulate_tag = 0

let accumulate_code (k:accumulator) (x:t) =
  let o = Obj.repr k in
  let osize = Obj.size o in
  let r = Obj.new_block accumulate_tag (osize + 1) in
  for i = 0 to osize - 1 do
    Obj.set_field r i (Obj.field o i)
  done;
  Obj.set_field r osize (Obj.repr x);
  (Obj.obj r:t)

let rec accumulate (x:t) =
  accumulate_code (Obj.magic accumulate) x

let mk_accu_gen rcode (a:atom) =
  let r = Obj.new_block 0 3 in
  Obj.set_field r 0 (Obj.field (Obj.magic rcode) 0);
  Obj.set_field r 1 (Obj.field (Obj.magic rcode) 1);
  Obj.set_field r 2 (Obj.magic a);
  (Obj.magic r:t)

let mk_cofix_accu =
  mk_accu_gen accumulate 0xdead

let v = mk_cofix_accu (Obj.magic 0xbeef : t)

let () =
  let obj = Obj.repr v in
  assert (Obj.size obj == 4);
  let f = Obj.field obj 2 in
  assert (Obj.is_int f);
  assert ((Obj.magic f : int) == 0xdead)

When compiling without flambda, v := (tag = 0)[code, 1, 0xdead, 0xbeef] while when it’s on I get v := (tag = 0)[code, 1, 0xbeef]. The accumulator structure is essentially a hack to encode a infinitary function in OCaml.

Any thoughts? There is a specific Coq bug report here.

My understanding is that the non-flambda version adds
an implicit “environment” parameter to accumulate
while the flambda version does not. Here is what seems
to be a minimal reproduction case:

let rec f _ _ = Obj.magic f
let () = print_int Obj.(size @@ repr (f ()))

If will print 4 with a 4.06.1 non-lambda compiler, and 2
with a flambda one.

1 Like

Would it help to wrap this (highly dangerous) code in Sys.opaque_identity? See here and here

I don’t think that @xclerc’s example is related. That case is flambda removing variables from the closure environment of the closure resulting from partial application.

Your original example is I think due to a change in how the recursive reference to accumulate is being accessed. I think that in non-flambda mode the reference to accumulate inside the definition of accumulate is replaced by the accumulate closure parameter that was implicitly passed to accumulate – which since you’ve been messing around with the closures is not actually a reference to accumulate’s closure in your code. In flambda mode the recursive reference to accumulate is instead being fetched directly from its statically know address – since accumulate is a top-level function we allocate it’s block statically and can access the closure directly.

This is clearly a sound optimisation – the closure passed into a function will always be the closure of that function in regular OCaml code.

Not sure what the best way to fix your code is – but if can avoid playing around too much with closures things should go a little smoother.

It might also be worth noting that since you are not giving these fake closures you’re making the closure tag, they may segfault in -no-naked-pointers mode – which may one day be the default, so you might want to use Obj.closure_tag instead of 0 for their tag.

My observation was that if you use the -dcmm command-line
switch, you get the following with a non-flambda compiler:

(function{min.ml:1,10-27} camlMin__f_1002
     (param/1004: val param/1003: val env/1263: val)
 env/1263)

(function{min.ml:2,37-43} camlMin__fun_1268 (arg/1266: val env/1272: val)
 (let env/1273 (load val (+a env/1272 24)) env/1273))

and the following with a flambda one:

(function camlMin__f_partial_fun_42 (param_45/1265: val)
 (app{min.ml:2,37-43} "camlMin__f_4" 1a param_45/1265 val))

(function{min.ml:1,10-27} camlMin__f_4 (param_7/1264: val param_6/1263: val)
 "camlMin__f_4_closure")

Which is (of course) consistent with your analysis: in the first
case the closure is retrieved from the additional “environment”
parameter while in the second case there is no such parameter
and the closure is retrieved “directly” as you pointed out.

The same can be observed regarding accumulate in the original
example:

(function{source.ml:19,19-69} camlSource__accumulate_1070
     (x/1071: val env/1093: val)
 (app{source.ml:20,2-42} "camlSource__accumulate_code_1006" env/1093 x/1071
   val))

vs

(function{source.ml:19,19-69} camlSource__accumulate_50 (x_53/1109: val)
 (app{source.ml:20,2-42} "camlSource__accumulate_code_21"
   "camlSource__accumulate_50_closure" x_53/1109 val))

Then if you amend the code to have:

let rec accumulate (x:t) =
  Printf.printf "<-- %d\n%!" Obj.(size @@ repr accumulate);
  accumulate_code (Obj.magic accumulate) x

the program will print 2 with flambda and 3 without it. This made
me think that the different shapes of the closures explained the problem.

This optimization is indeed triggered; it can be shown by modifying the code
to have:

let rec f _ _ _ _ _ _ = Obj.magic f
let () = print_int Obj.(size @@ repr (f () () () () ()))

Then, you still get 2 with flambda but 8 without it.

I think this is the combination of this optimization and the
presence/absence of the environment that explains that
you get 2 vs 4 in my original example.

Yes, unluckily I am aware of that. I have been working on removing a similar source of naked pointers in the Coq VM, which is essentially a tailored version of the OCaml bytecode VM. Using the closure tag might not be enough here, because it can compete with proper closures and we need to be able to discriminate between both. It might require a complete rework of the native compiler thus. Do you have an idea of how far that one day is?

1 Like

There is no way something like that will work with flambda or be maintainable. This code is assuming a precise compilation schema for closure representation, and we are not going to provide any way to predict it. What is the aim of that code and was it handwritten ? It feels like this is trying to tie the knot of a recursive function definition. There is certainly some way to represent that decently with an intermediate block that get patched rather than patching the function itself.

Do you have an idea of how far that one day is?

I don’t think it will be any time soon.

Native compilation is implementing normalization by evaluation, so that you need to properly handle neutral terms. This is the goal of the accumulator code that can be seen as some way to implement a weak form of delimited continuations. Essentially, we want a term accu which has infinite arity, i.e. it can be applied to an arbitrarily large applicative context, in a way that allows to retrieve this context when needed. That is, in terms of API:

type accumulator
val accu : accumulator
val apply : accumulator -> 'a -> accumulator
val data : accumulator -> Obj.t array

s.t.

data (apply ... (apply accu x_1) ... x_n) = [|x_1; ...; x_n|]

We also need to handle pattern-maching contexts, and if I understand correctly the translation scheme, this is done by reserving the first constructor of algebraic types and setting the accumulator tag to be 0 so that matching on an accumulator does the right thing w.r.t. NBE.

This might seems dumb, but given your interface this would fit:

type accumulator = Obj.t list
let accumulator : accumulator = []
let apply l v = Obj.repr v :: l
let data l = Array.of_list l

What more do you need to require accumulator to be a function ?