[BLOG] The Growth of the OCaml (Binary) Distribution

Here is a blog post that I wrote recently, about the increase of the size of the OCaml Distribution over the recent years:

Hope it can be interesting to read !

Fabrice, OCamlPro


Making the compiler distribution smaller also helps with the size of local opam switches. Right now, a local switch takes up hundreds of megabytes, which can get painful fast if you have a lot of separate projects.


I hadn’t really thought about it but there’s a sizeable chunk of executables in ~/.opam/default/bin:

$ opam switch
#  switch   compiler                   description
→  default  ocaml.4.14.0               default
$ cd ~/.opam/default/bin
$ ls
cppo			ocamlc.opt		ocamllex.byte		ocamlopt.byte		odoc
dune			ocamlcmt		ocamllex.opt		ocamlopt.opt		omd
lambda-term-actions	ocamlcp			ocamllsp		ocamloptp		rescript_syntax
mel			ocamlcp.byte		ocamlmklib		ocamloptp.byte		safe_camlp4
melc			ocamlcp.opt		ocamlmklib.byte		ocamloptp.opt		usegtrip
meldep			ocamldebug		ocamlmklib.opt		ocamlprof		utftrip
menhir			ocamldep		ocamlmktop		ocamlprof.byte		utop
ocaml			ocamldep.byte		ocamlmktop.byte		ocamlprof.opt		utop-full
ocamlbuild		ocamldep.opt		ocamlmktop.opt		ocamlrun		ydump
ocamlbuild.byte		ocamldoc		ocamlobjinfo		ocamlrund
ocamlbuild.native	ocamldoc.opt		ocamlobjinfo.byte	ocamlruni
ocamlc			ocamlfind		ocamlobjinfo.opt	ocamlyacc
ocamlc.byte		ocamllex		ocamlopt		octavius
$ du -sh .
427M	.

Are all of these really necessary? E.g. do we really need to distribute ocamlopt.byte and friends in a binary distribution?

EDIT: side note, what is octavius? It doesn’t appear in the source code. And it’s pretty mysterious:

$ octavius -help
File "-help" does not exist
$ octavius --help
File "--help" does not exist
$ octavius
Usage: octavius FILE

Yeah, I wonder who’s using the bytecode executables nowadays. They’re pretty huge too.

I think it’s part of odoc?

The Docker images published under ocaml/opam don’t include them - it’s not as far as I know ever caused a CI issue!

As part of reviewing one of the PRs on the compiler’s build system recently, I’d been reminded of the -linkall on ocamlcommon. It’d be really nice to get rid of that, but that involves either splitting ocamlcommon or doing some very fiddly work on the type checker’s global state.


opam - octavius ?

1 Like

I don’t really understand the need to have -linkall on a library.
The only two cases I have met where it was used:

  • You have modules that perform side effects without being called by other modules, and you are afraid that, without -linkall, these modules wouldn’t be linked and the side effects would not be performed. You can usually fix this problem by having an init() function in the module, that performs the side-effects only once, and you call it from all the other modules that need the side effects

  • You have modules that you want to link even when you don’t need them, because you use Dynlink and want them to be available for plugins. I think that, in such cases, you should put -linkall when linking the executable, not on the library. If this does not give you the granularity to choose which libraries should be linked-all, then another flag should be added to the compiler to fix this.


Original discussion for ocamlcommon is in [github patch] add -linkall flag to ocamlcommon archives · Issue #6509 · ocaml/ocaml · GitHub. The problem is forward references, which can’t be solved trivially by init functions. The pattern here is something like foo.ml:

let forward_fn = ref Fun.id

let api_call x =
  let y = do_something_here x in
  !forward_fn y

and bar.ml:

let _ =
  Foo.forward_fn := (fun x -> (* ... *))

let api_call x =
  let y = Foo.api_call x in
  (* ... *)

and the fear is a program that uses Foo.api_call but makes no references itself to Bar. All fixable, but it’s not trivially solved by having an init function (Foo cannot call Bar.init since Bar references Foo), which is why I say “fiddly” :slightly_smiling_face:

1 Like

I suppose that in the real example, you would split bar.ml into a foo_init:

let _ =
  Foo.forward_fn := (fun x -> (* ... *))

and bar.ml:

let () =
  (* ... *)
  let y = Foo.api_call x in
  (* ... *)

So, why not rename foo.ml into foo_needs_init.ml and create a file foo.ml:

let () = Foo_init.init ()
let api_call = Foo_needs_init.api_call

i.e. hide the part that needs initialization inside an internal module, that is then exported by an external module that performs the initialization ?