Note: this is part of some code I’m writing for a research project, so don’t worry about such insanity somehow floating into actual production code .
I’ve been recently working on a project wherein, for various reasons, I have been needing to type-check and evaluate dynamically generated OCaml code at runtime (and possibly even let that generated code interact with modules in the host context). By looking at the code of the native toplevel and with some help from the compiler-libs, I have managed to do this ! (hint, it involves using the compiler to compile the code down to a library, and then dynamically linking it to the current executable).
For the purposes of this post, the exact implementation with which I have done this is probably not that important, but we can summarise the high level interface to this dynamic execution capabilities with the following signature:
type env
val raw_parse_str: string -> Parsetree.structure
(** [raw_parse_str txt] uses the OCaml compiler to parse a given string into structure AST. *)
val raw_parse_expr_str: string -> Parsetree.expression
(** [raw_parse_expr_str txt] uses the OCaml compiler to parse a given string into expression AST. *)
val initial_env: unit -> env
(** [initial_env ()] returns an initial OCaml typing environment,
preloaded with the OCaml stdlib. *)
val dyn_load_definition_as_module: env -> mod_name:string -> ast:Parsetree.structure -> env
(** [dyn_load_definition_as_module env ~mod_name ~ast] compiles and
dynamically loads/evaluates the AST [ast] under the module name
[mod_name], returning an updated [env] *)
val eval_expr: env -> Parsetree.expression -> 'a
(** [eval_expr env expr] compiles and evaluates the expression
[expr] and returns the value.
NOTE: The return type of this function is 'a (any type at
all). You must explicitly annotate the return type at any call
site, otherwise be prepared for possible memory corruption. *)
If people are interested, I can also share the implementation of this module, but for now, it’s not that relevant.
For context, in my application, I have expressions that roughly follow the following structure:
type expr = [
`App of string * expr list
| `Int of int
| `Var of string
]
Each such reified term represents a particular test returning a boolean value, and may contain free variables that are only in scope within the context of a larger expression. For each such expression, I map it to an OCaml ast, include it within a larger expression with the appropriate variables, and evaluate the resulting AST using the library above to retrieve the boolean result:
(* initial test *)
let test = `App ("=", [`App ("Array.length", [`Var "a"]); `Var "i"])
(* surrounding context *)
let ctx =
let a = [| 1; 2; 3; 4 |] in
let i = 10 in
(??)
(* resulting OCaml expression *)
let a = [| 1; 2; 3; 4 |] in
let i = 10 in
Array.length a = i
This actually works quite well and reliably for the most part, save for one small problem: each evaulation involves running the compiler to build a library, and importantly dynamically linking it into the current executable. After several thousand or more evaluations, the dynamic linking seems to fail, I’m guessing because the kernel just gave up and wasn’t written expecting programs to dynamically load so many libraries.
I have found a (hacky) solution.
Because my expression language is a small subset of OCaml, I can generate a bespoke expression interpreter for a given evaluation context as follows (note how the evaluation of variables are deferred to an environment object passed in):
let eval = fun env expr ->
let rec eval_expr env : _ -> Wrap.wrap = function[@warning "-8"]
| `Var "a" -> MkWrap (env#a)
| `Var "i" -> MkWrap (env#i)
| `App ("Array.length", [ls]) -> MkWrap (Array.length (Wrap.unwrap (eval_expr env ls)))
| `App ("+", [l; r]) -> MkWrap ((Wrap.unwrap (eval_expr env l)) + (Wrap.unwrap (eval_expr env r)))
| `App ("=", [l; r]) -> MkWrap ((Wrap.unwrap (eval_expr env l)) = (Wrap.unwrap (eval_expr env r)))
| `Int i -> MkWrap i in
(Wrap.unwrap @@ eval_expr env expr : bool)
Where, Wrap.wrap
and Wrap.unwrap
are a module defined as follows:
module Wrap = struct
type wrap = MkWrap : 'a -> wrap
let unwrap (MkWrap a) = Obj.magic a
end
I can then combine this with the outer specification to get a self-contained expression that I only need to compile once, but can be used to test as many reified computations as I want:
fun expr ->
let a = [| 1; 2; 3; 4 |] in
let i = 10 in
eval (object method a = a method i = i end) expr
Running this actually seems to work exactly as I expect (the expressions I generate are guaranteed to be well-typeable? by construction).
So, my question for the OCaml community is: is this legal/defined behaviour? (assuming my expressions are well typed, and no other shenanigans are in play)
Of course, no one should actually use this in production code, but it’s a fun little trick I suppose