ocamlclean is only for bytecode binaries, so for native code (which is the main backend for MirageOS unikernels these days), the LTO patch is the way to go. It would be good to do some investigations into why the hello world is so big; for example, a fix merged today to the ppx libraries stopped compiler-libs from being accidentally linked into the runtime unikernel.
Cool. ppx_tools_versioned 5.2.1 gets hello.ukvm to 1.8 M
Out of curiosity: this paper Unikernels: Library Operating Systems for the Cloud in 2013 reports that the size of mirageOS images was much smaller, Web Server 0.673 MB (and 0.172 MB after ocamlclean). I wonder why we cannot have a similar size for native-code mirageos?
@Keiko: you can use upx --best and/or strip to reduce the binary size even further. The hello-world for native apps should be between 500k and 850k (depending on the target).
Triggered by the discussion here I looked into modules linked into our binaries and I see a lot of modules that I suspect belong to PPX processors but should not be in the binary. @hannes discovered such a problem in a pull request avoid compiler-libs runtime dependency in ppx_cstruct. Examples for modules are:
Ast_407
Migrate_parsetree
Ppxlib_ast
Typedtree
I expect that these modules are used by a PPX implementation at compilation time but are not required at run time.
Does this point to a common misunderstanding how PPX processors are packaged or used with jbuilder/dune? I am sure that we don’t request these libraries explicitly and I suspect they are introduced by PPX processors.
ocamlfind query -r package shows all dependencies of package. These should include compile-time dependencies. I looked at the binary directly and also inspected the linking step. I don’t know of a good way to find the packages that brings in certain dependencies. Opam 2 during installation of a package shows what other packages are installed and why. Maybe there is a way to query it.
You can use opam list --recursive --required-by some-package to list all recursive dependencies for some-package. Unfortunately I’ve found this includes the union of dependencies for all versions of some-package. To avoid this you should specify a specific version of some-package: opam list --recursive --required-by some-package.3.2.0
I fixed a similar problem very recently, where one of the libraries we used were pulling in ppx_deriving. After fixing, the resulting binary size was reduced by more than 10Mb.
Would it be possible to have dune warn if a library is listed in both libraries and pps statements in the dune configuration file? This would have caught the errors I found (and fixed).
Maybe the dune documentation could be more explicit in mentioning that ppx rewriter libraries should not be also be listed in libraries list, as its already considered a build time dependency.
Here is a Ruby script that I have used to inspect the size of modules inside an OCaml binary on Linux by analyzing its nm(1) output: https://gist.github.com/lindig/7ab6c663f7bb763322f65b7cda60f29c. It can also take two binaries. Output looks like this:
Thank you. Just to be explicit: when a PPX rewriter introduces a runtime dependency for the code it generates, has this dependency to be declared at all where the PPX rewriter is used? Or is this automatically taken care of by the PPX rewriter’s META data?
In the latter case, any mention of PPX libraries in jbuild files are likely to be wrong. From your commit I get the impression that runtime libraries need to be declared by the user of a PPX.
A small experiment with ppx_deriving suggests to me that it is enough to declare (preprocess (pps (ppx_deriving.std))) and that the runtime code is linked into the final result without having to mention it explicitly in the jbuild file. https://github.com/lindig/hello/tree/ppx
`
This is tricky to answer, PPX rewriter themselves don’t contain any magic to add runtime dependencies to META files (that’s why we have nocrypto, tls (thx to diml), and a similar patch in x509 to include runtime dependency of ppx).
OTOH dune, as diml mentioned here, handles runtime dependencies of PPX rewriters automatically.
AFAICT, ocamlfind query uses the META files, which only list runtime dependencies, and thus no compile-time dependencies are involved. I first look into ocamlfind output, and then into the META files directly to find these dependencies.