Pass OCaml values to and from OCaml code evaluated at runtime

I am working on a program where I have a reified instance of some well typed OCaml code (i.e in AST form), and I’d like to run the program using the OCaml interpreter and collect some instrumentation information from the runtime behaviour of the program (that I know will adhere to a certain form).

So far, the approach I’m using is to call out to ocaml with the -stdin flag, pass in the pretty printed form of the AST with annotations via stdin, and at the end of the program, print out the marshalled instrumentation data, which I then unmarshal in my outer program:

open Bos
open Containers

let source = {ocaml|
  let ls = ref []
  let add pos = ls := pos :: !ls
  let marshal () =
    print_endline @@ Marshal.to_string (List.rev !ls) Marshal.[]

  let f i =
    add (__LINE__, "i", i);
    let ftmp = (fun acc v ->
        add (__LINE__, "v", v);
        acc + v
     ) in
    let result = List.fold_left ftmp i [1;2;3;4] in
    result

  let () = ignore (f 10); marshal ()
|ocaml}

let () =
  let compile_output = OS.Cmd.run_io Cmd.(v "ocaml" % "-stdin") (OS.Cmd.in_string source) in

  let ls : (int * string * int) list =
    (OS.Cmd.out_string compile_output
     |> Result.get_exn
     |> fst)
    |> Fun.flip Marshal.from_string 0 in

  List.iter (fun (line, v,vl) ->
    Printf.printf "%d: %s -> %d\n" line v vl
  ) ls
(* outputs:
10: i -> 10
13: v -> 1
13: v -> 2
13: v -> 3
13: v -> 4 *)

This works, but I was wondering if there was a better way?

I already have the OCaml program in AST form, so the parsing is already handled - it seems a little wasteful to go through the whole rigamarole of converting it to and from strings - that, and calling out to the interpreter through the command line means that I have less freedom on how the context will be set up - i.e each time I send a piece of code, the ocaml interpreter is spun up from scratch, so I can’t easily preserve information between runs.

In particular, is there some kind of Compiler-libs libraries that could be used to pass an AST directly to the bytecode-compiler/interpreter, supplying a suitable evaluation context? maybe?

Yes, this is possible, look at the Topeval.execute_phrase function ocaml/topeval.mli at cce52acc7c7903e92078e9fe40745e11a1b944f0 · ocaml/ocaml · GitHub
(note the comment seems to be stale)

But you will need to recreate the initial setup to make the function work. See Toploop.loop or Toploop.run_script ocaml/toploop.mli at 66e78aa4740eaa6b1516379a1607ab1e4f30765e · ocaml/ocaml · GitHub to see how everything is tied together.

Cheers,
Nicolas

2 Likes

Perfect, thanks for the link!

For any future readers, the following code was what I used:

open Containers

let env =
  let crc_initfs = Symtable.init_toplevel () in
  Compmisc.init_path ();
  Env.import_crcs ~source:Sys.executable_name crc_initfs;
  let env = Compmisc.initial_env () in
  Sys.interactive := true;
  env

let load_lambda lam =
  let slam = Simplif.simplify_lambda lam in
  let (init_code, fun_code) = Bytegen.compile_phrase slam in
  let (code, reloc, events) = Emitcode.to_memory init_code fun_code in
  let can_free = List.is_empty fun_code in
  let initial_symtable = Symtable.current_state () in
  Symtable.patch_object code reloc;
  Symtable.check_global_initialized reloc;
  Symtable.update_global_table (); 
  let bytecode, closure = Meta.reify_bytecode code [| events |] None in
  let res = match closure () with
    | retval -> Some retval
    | exception _ -> Symtable.restore_state initial_symtable; None in
  if can_free then Meta.release_bytecode bytecode;
  res
  
(* use wisely....  *)
let execute_phrase str =
  Typecore.reset_delayed_checks ();
  let (str,sg,names,newenv) =
    Typemod.type_toplevel_phrase env str in
  let sg' = Typemod.Signature_names.simplify newenv names sg in
  ignore (Includemod.signatures ~mark:Mark_positive env sg sg');
  Typecore.force_delayed_checks ();
  let lam = Translmod.transl_toplevel_definition str in
  Warnings.check_fatal ();
  load_lambda lam
  |> Option.map Obj.obj
  
let parse_toplevel str =
  str
  |> Lexing.from_string
  |> Parser.toplevel_phrase Lexer.token 
  |> function[@warning "-8"] Parsetree.Ptop_def s -> s

let res : (int * string * [`Value of int | `List of int list]) list option =
  {ocaml|
  let ls = ref [] in
  let add pos = ls := pos :: !ls in
  let f i =
    add (__LINE__, "i", (`Value i));
    let ftmp = (fun acc v ->
        add (__LINE__, "v", (`Value v));
        acc + v
     ) in
    let ftmp2 = (fun v ->
        add (__LINE__, "v", (`Value v));
        v + 1
     ) in
    let upd1 = List.map ftmp2 [1;2;3;4] in
    add (__LINE__, "upd1", (`List upd1));
    let result = List.fold_left ftmp i upd1 in
    result in
  (ignore (f 10);
  !ls) ;;
|ocaml}
  |> parse_toplevel
  |> execute_phrase
  |> Option.map fst

let () =
  print_endline @@ (Printf.sprintf "dynamically evaluated code was %s" @@
                    (res
                     |> Option.map
                          [%derive.show: (int * string * [`Value of int | `List of int list]) list]
                     |>  Option.value ~default:"None"))

I’m interested, Is it techically possible/difficult to do the same but for native code. For example, generate some code and after some executions reorder some if expressions using something like Profile-Guided Optimization?

I’m not sure if the API of the native toplevel is complete enough to allow this currently, but in theory you can compile code on the fly, dynlink it, and run it. Then assuming that you managed to collect profiling information, you could re-generate code that better fits the use case, re-compile it and re-dynlink it.
But you can’t patch the code currently running; you have to actually generate new code each time. And since code can’t be collected, if you do this too often you’re going to need more and more memory for your code.

As a side note, I believe Jane Street has worked on the subject of Profile-Guided Optimisation, you should be able to find a bit of information there.

Right now I’m imaging a shallow-embedded DSL where some expressions could be reordered. (That means that approaches for PGOing OCaml code are not exactly what I want, but thanks for the link anyway :slight_smile: ) I’m OK to postponing questions about collecting the compiled code and technical questions about dynlinking it (I believe that this stuff is either not important or easy). I’m not going to patch the code already running, I want only to collect some profiling information, maybe rerun my code and return the right “compilation scheme” as the result of my program.

It looks like that collecting profiling information doesn’t have an obvious solution and I should invent an ad hoc approach. OK.

@Gopiandcode, do you have any plans about making a tutorial about running bytecode in interpreter and collecting some profiling information?

Unfortunately no plans for the near future no. My aim here was to collect some profiling data for the puproses of validating/enhancing a static analysis I was working on, rather than for optimisations themselves, although I do agree that would be interesting. Also, that part of the OCaml API doesn’t seem to have much documentation, so at the moment, I’m not familiar enough with it to make any declaritive guides on how to do it (not that that’s stopped me before though :wink: ), I mainly got it to work by copying the rough structure of the toploop, and then adding more initialisation steps until my program stopped crashing.