Distribute ocaml binary and cma files

I am trying to distribute an repl embedding some libraries so it can be run on a computer without ocaml/opam.

Typically, I want to allow users using a driver to execute ocaml scripts. e.g.

let () =
  let b = My_lib.open_base "/foo/bar" in
  (* ... *)
  My_lib.close_base b

I created a dummy library using what I want to be embedded and ran dune top which gives me some directives.

Running ocaml -init <file_with_dune_top_output> works fine so I tried to edit my file in order to use relative pathes, and it worked.

Then I copied all the cma files used in the init file from .opam/4.11.1/lib/**/*.cma into ./etc/lib/**/*.cma.
And it’s where things stopped working. I does not crash or anything, it justs launche the repl but my libs are unbound.

Before trying to solve an hypothetical missing dependencies in my ./etc/lib, is there any chance that distributing ocaml binary and few cma files would give the user a toplevel able to execute their script ?

If not, is there anything I can do in order to obtain such a tool ? i.e. an ocaml REPL with my libraries loaded that can be used on a computer without ocaml ?

EDIT : I do not actually need interactive repl, even if it would be nice. I want to execute ml files.

You might want to look at ocamlmktop (documentation).
If you want to run the resulting toplevel on a computer that doesn’t have an OCaml runtime installed, you’ll likely want to use the -custom flag.

And to add to @vlaviron’s answer you will have to carry the .cmi files of the libraries you want to give the user access to.

Not really something that can be used off-the-shelf, but this can also be worked around by using a custom load function in Persistent_env.Persistent_signature; what is needed is some way to serialize the .cmi files in some form inside the main executable, and a way to read them back in the load function.

Cheers,
Nicolas

1 Like

First, thank you all for your time.

Unfortunately, I can’t make it work.

I push a dummy project here : GitHub - sagotch/testop

First, I tried with method used in dune documentation.

$ dune build
$ echo "open Driver ;;" |_build/default/bin/testop.bc
        OCaml version 4.11.1

Findlib has been successfully loaded. Additional directives:
  #require "package";;      to load a package
  #list;;                   to list the available packages
  #camlp4o;;                to load camlp4 (standard syntax)
  #camlp4r;;                to load camlp4 (revised syntax)
  #predicates "p,q,...";;   to set these predicates
  Topfind.reset();;         to force that packages will be reloaded
  #thread;;                 to enable threads
  # open Driver ;;
Error: Unbound module Driver

Then, I tried using ocamlmktop

$ ocamlmktop -o foo.exe -linkall -custom -I _build/default/lib/.driver.objs/byte _build/default/lib/driver.cma
$ echo "open Driver ;;" | ./foo.exe
        OCaml version 4.11.1

Findlib has been successfully loaded. Additional directives:
  #require "package";;      to load a package
  #list;;                   to list the available packages
  #camlp4o;;                to load camlp4 (standard syntax)
  #camlp4r;;                to load camlp4 (revised syntax)
  #predicates "p,q,...";;   to set these predicates
  Topfind.reset();;         to force that packages will be reloaded
  #thread;;                 to enable threads
  # open Driver ;;
Error: Unbound module Driver
#

Did I missed some obvious stuff? Did I misunderstood what ocamlmktop actually does?

You need to make sure the toplevel sees driver.cmi either by invoking it with an appopriate -I flag, or by invoking #directory at the beginning of the session or in a file provided via -init.

Ok sorry about that, you mentionned it earlier but I was expecting module to be openable even without the .mlicmi. Don’t ask me why, or what should be the behavior in that case, I clearly have no idea.

It now loads if I give the -I flag. It still is a strange behavior to have to do so, I guess, since I do not really understand what it means for a module to be loaded (as ocamlmktop is supposed to do) but not openable/usable.

I’ll try to fix my problem importing cmi files, and will mark as resolve after that, thanks.

Note, it’s the .cmi.

It means that the module is loaded in there. Compiled code loaded inside the toplevel can use it. But it is hidden from the interactive user.

The code written by the interactive user needs to be typechecked and compiled which is what the .cmi files are used for, like in the regular compilation pipeline.

I think it might work soon.

I ended with a long ocamlmktop invocation:

ocamlmktop -o gwrepl.exe -linkall -custom -I /home/jsagot/.opam/4.11.1/lib/ocaml -I /home/jsagot/.opam/4.11.1/lib/bytes -I /home/jsagot/.opam/4.11.1/lib/calendars -I /home/jsagot/.opam/4.11.1/lib/re -I /home/jsagot/.opam/4.11.1/lib/seq -I /home/jsagot/.opam/4.11.1/lib/stdlib-shims -I /home/jsagot/.opam/4.11.1/lib/uchar -I /home/jsagot/.opam/4.11.1/lib/unidecode -I /home/jsagot/.opam/4.11.1/lib/uucp -I /home/jsagot/.opam/4.11.1/lib/uunf -I /home/jsagot/.opam/4.11.1/lib/uutf -I /home/jsagot/workspace/geneweb/_build/default/lib/def/.def.objs/byte -I /home/jsagot/workspace/geneweb/_build/default/lib/gwdb-legacy/.gwdb_legacy.objs/byte -I /home/jsagot/workspace/geneweb/_build/default/lib/gwdb/.geneweb_gwdb.objs/byte -I /home/jsagot/workspace/geneweb/_build/default/lib/io/.io.objs/byte -I /home/jsagot/workspace/geneweb/_build/default/lib/util/.util.objs/byte /home/jsagot/.opam/4.11.1/lib/ocaml/unix.cma /home/jsagot/workspace/geneweb/_build/default/lib/def/def.cma /home/jsagot/.opam/4.11.1/lib/re/re.cma /home/jsagot/workspace/geneweb/_build/default/lib/io/io.cma /home/jsagot/.opam/4.11.1/lib/calendars/calendars.cma /home/jsagot/.opam/4.11.1/lib/stdlib-shims/stdlib_shims.cma /home/jsagot/.opam/4.11.1/lib/unidecode/unidecode.cma /home/jsagot/.opam/4.11.1/lib/uucp/uucp.cma /home/jsagot/.opam/4.11.1/lib/uunf/uunf.cma /home/jsagot/.opam/4.11.1/lib/uutf/uutf.cma /home/jsagot/workspace/geneweb/_build/default/lib/util/util.cma /home/jsagot/workspace/geneweb/_build/default/lib/gwdb-legacy/gwdb_legacy.cma /home/jsagot/workspace/geneweb/_build/default/lib/gwdb/geneweb_gwdb.cma gwrepl_data/mktoplevel_data.cma mktoplevel_gwrepl.cma

mktoplevel_data.cma contains all the needed cmis marshalled into a string, and mktoplevel.cma does this:

let () =
  Clflags.noversion := true ;
  Clflags.noinit := true ;
  let old = !Persistent_env.Persistent_signature.load in
  let ht = Hashtbl.create (Array.length Data.cmis) in
  Array.iter begin fun (src, cmi) ->
    let filename = src in
    let cmi = Marshal.from_string cmi 0 in
    print_endline ("Loading " ^ cmi.Cmi_format.cmi_name ^ " -- " ^ src) ;
    let x = Persistent_env.Persistent_signature.{ filename ; cmi } in
    Hashtbl.add ht cmi.Cmi_format.cmi_name x
  end Data.cmis ;
  Persistent_env.Persistent_signature.load := begin fun ~unit_name ->
    match Hashtbl.find_opt ht unit_name with
    | Some t as x -> print_endline ("Found " ^ unit_name ^ " -- " ^ t.filename) ; x
    | None -> print_endline ("Fallback " ^ unit_name) ; Unix.sleep 1 ; old ~unit_name
  end

If I built this on my computer, and launch gwrepl.exe, I have this: (it is what is expected)


Loading Stdlib -- /home/jsagot/.opam/4.11.1/lib/ocaml/stdlib.cmi
[...]
Loading Gwdb -- /home/jsagot/workspace/geneweb/_build/install/default/lib/geneweb/gwdb/gwdb.cmi
Found Stdlib -- /home/jsagot/.opam/4.11.1/lib/ocaml/stdlib.cmi

But once I compile gwrepl.exe in my docker image, and run the resulting executable on my laptop, I have a strange error:

Loading Stdlib -- /home/geneweb/.opam/4.09.1+flambda+no-flat-float-array/lib/ocaml/stdlib.cmi
[...]
Loading Gwdb -- /home/geneweb/geneweb/geneweb/_build/install/default/lib/geneweb/gwdb/gwdb.cmi
File "command line", line 1:
Error: Unbound module Stdlib

Any idea of what is going wrong?

It likely looks for stdlib.cmi at the wrong place or there is no such file.

Hum… but there should be no stdlib.cmi at all since there is no OCaml on the targeted machine (isn’t it the purpose of -custom option?). Also, note that while I give complete path in the log, but it is actually meaningless since interfaces are served directly from embedded values (marshalled Cmi_format).

Are you sure you included stdlib.cmi in your Mktoplevel_data magic ?

I do, it’s the first cmi to be unmarshalled and recorded (that is what the following line means):

Loading Stdlib -- /home/geneweb/.opam/4.09.1+flambda+no-flat-float-array/lib/ocaml/stdlib.cmi

And it is actually hardcoded in that script generating Mktoplevel_data. Here is the data generator: mk_data.ml · GitHub

Next thing I tried: my initial idea, with the binary embedding .cmi and .cma (and .so) files: https://github.com/geneweb/geneweb/pull/1217/files

When launched, the executable extract all the files in a temporary directory and load/configure before entering the main toploop.

It works, but I feel like it is not the way it should be done.

And I would like to understand why my other attempt fails with this unbound Stdlib

I’m interested too but I didn’t even know about @nojb’s trick. If you have time you could maybe try to make a simple self-contained example with repro instructions so that we can try to look in more details.

I’m not sure exactly what went wrong, but:

  • You could try adding Clflags.no_std_include := true to your code, which should make both versions (on your computer and through docker) behave similarly. It will likely mean that both will fail, but it might be easier to debug.
  • You could also try adding Clflags.nopervasives := true. I’m suspecting that maybe the code in the compiler treats initially open modules differently (requiring a .cmi file present on disk for them, even though it might not be read if Persistent_env.Persistent_signature.load is modified), and the nopervasives flag tells the compiler not to open the Stdlib module automatically. If that works, you will have to prefix your inputs with open Stdlib though (if you plan to provide an init file, you can put it at the top of that file instead).

If the nopervasives trick works, then it likely means that the Unbound module Stdlib error you get is a bug in the compiler, and it would be nice to file an issue about it.

I’m coming late to the discussion but indeed the compiler has some special logic around the opening of the “initial module” (Stdlib unless nopervasives is set to true) which looks like it may be going around Persisent_signature.load, so I suspect doing what @vlaviron suggests will probably be enough to fix the issue at hand.

Filing an issue to investigate whether this is an expected behaviour or a bug (which may be related to some later changes done to support the “prefixed” Stdlib) would also be a good idea.

Cheers,
Nicolas

PS the relevant bit of the compiler is the function Typemod.initial_env which is called with initially_opened_module equal to Some "Stdlib" when nopervasives is false.

Using this, I have this unexpected behavior:

# open Stdlib ;;
Error: Unbound module Stdlib
# open Stdlib.List ;;
Found Stdlib -- /home/geneweb/.opam/4.09.1+flambda+no-flat-float-array/lib/ocaml/stdlib.cmi
Found Stdlib__list -- /home/geneweb/.opam/4.09.1+flambda+no-flat-float-array/lib/ocaml/stdlib__list.cmi

This looks like the code that wrongly relies on the filesystem is not restricted to the initial opens.
Out of curiosity, if you try to open Stdlib twice, does it work the second time ?
I can’t find any reason why Stdlib.List would work and not Stdlib, but I would be less surprised if it turns out that trying to load a module twice could first fail then work.

It loads when requesting Stdlib.List as the very first input, but still fail with Stdlib alone after this.

# open Stdlib.List ;;
Found Stdlib -- /home/geneweb/.opam/4.09.1+flambda+no-flat-float-array/lib/ocaml/stdlib.cmi
Found Stdlib__list -- /home/geneweb/.opam/4.09.1+flambda+no-flat-float-array/lib/ocaml/stdlib__list.cmi
# open Stdlib ;;
Error: Unbound module Stdlib
# open Stdlib ;;
Error: Unbound module Stdlib

I’ll try to make a minimal example in order to reproduce this when I have time.