Mixing bytecode and native

I think it’s been discussed before but cannot find any lasting reference to this, so here goes.

What are the technical limitations to having programs that mix bytecode and native (OCaml) code? Say, a toplevel that can load/link .cmxs files but retain the ability to evaluate new toplevel phrases in bytecode? My naive understanding is that memory representation is the same for native and bytecode, so that should boil down to incompatible call conventions?

I think such a thing would be very desirable for any repl-driven workflow, including dune utop path/ and the likes. Most of the code that needs to be efficient would be natively compiled, so one wouldn’t be that badly impacted by bytecode performance compared to the current situation. I’d be very interested in hearing/reading about how difficult to do this would be.

2 Likes

I would say so. In Coq, we are mixing both, and the intricate part is indeed calling convention. When calling native code from bytecode, the bytecode first calls a C function which in turns call caml_callback.

How do you handle calls to bytecode functions from the native side ? In my intuition there’s no good way to make it work, so you would need to make sure that no bytecode closures are reachable from the native side. If you have found a way around that, I’m curious to know how it’s done.

Regarding the repl-driven workflow, a few people keep working on the native toplevel, so it’s possible that at some point it will become part of the official distribution.

There are no calls to bytecode functions, only calls to the bytecode interpreter, which is a C function. So, one direction or the other, calls go through C code, which requires a bit of discipline.

And to be complete, I should mention that the execution goes from native code to bytecode to native code (and back), but no further. The inner native code does not call any bytecode. In other words, the bytecode interpreter is not reentrant, as far as we are concerned.