Reducing mirageos image size using ocamlclean?

I am looking into how to reduce the size of mirageos images. Currently I am playing around with the ukvm target.

Using 4.06.1+lto, the size of hello.ukvm is 6M (stripped). (–gc-sections gets me to 5.5M.)

I have read somewhere that I can use ocamlclean to reduce the image size. Is this still the case? I could not figure out how to do so.

I would be very grateful if someone could point me in the direction of reducing mirageos image size.

ocamlclean is only for bytecode binaries, so for native code (which is the main backend for MirageOS unikernels these days), the LTO patch is the way to go. It would be good to do some investigations into why the hello world is so big; for example, a fix merged today to the ppx libraries stopped compiler-libs from being accidentally linked into the runtime unikernel.

1 Like

Cool. ppx_tools_versioned 5.2.1 gets hello.ukvm to 1.8 M :slight_smile:

Out of curiosity: this paper Unikernels: Library Operating Systems for the Cloud in 2013 reports that the size of mirageOS images was much smaller, Web Server 0.673 MB (and 0.172 MB after ocamlclean). I wonder why we cannot have a similar size for native-code mirageos?

1 Like

@Keiko: you can use upx --best and/or strip to reduce the binary size even further. The hello-world for native apps should be between 500k and 850k (depending on the target).

@samoht: 1.8 M is strip-ed and linked with --gc-sections. The target is ukvm.

upx --best hello.ukvm fails with UnknownExecutableFormatException :’-(

upx will only work on a unix target. In any case, it only reduces the
on-disk executable size, not the memory requirement.

1 Like

I asked about rising binary sizes before: Large binaries - break down the size by library? Is there a chance that the PPX mechanism is responsible for this (at least in part)?

Triggered by the discussion here I looked into modules linked into our binaries and I see a lot of modules that I suspect belong to PPX processors but should not be in the binary. @hannes discovered such a problem in a pull request avoid compiler-libs runtime dependency in ppx_cstruct. Examples for modules are:

  • Ast_407
  • Migrate_parsetree
  • Ppxlib_ast
  • Typedtree

I expect that these modules are used by a PPX implementation at compilation time but are not required at run time.

Does this point to a common misunderstanding how PPX processors are packaged or used with jbuilder/dune? I am sure that we don’t request these libraries explicitly and I suspect they are introduced by PPX processors.

3 Likes

Is there a way to (semi-)automatically infer which library brings them in?

ocamlfind query -r package shows all dependencies of package. These should include compile-time dependencies. I looked at the binary directly and also inspected the linking step. I don’t know of a good way to find the packages that brings in certain dependencies. Opam 2 during installation of a package shows what other packages are installed and why. Maybe there is a way to query it.

1 Like

You can use opam list --recursive --required-by some-package to list all recursive dependencies for some-package. Unfortunately I’ve found this includes the union of dependencies for all versions of some-package. To avoid this you should specify a specific version of some-package: opam list --recursive --required-by some-package.3.2.0

1 Like

I fixed a similar problem very recently, where one of the libraries we used were pulling in ppx_deriving. After fixing, the resulting binary size was reduced by more than 10Mb.

Would it be possible to have dune warn if a library is listed in both libraries and pps statements in the dune configuration file? This would have caught the errors I found (and fixed).

Maybe the dune documentation could be more explicit in mentioning that ppx rewriter libraries should not be also be listed in libraries list, as its already considered a build time dependency.

/Anders

1 Like

Here is a Ruby script that I have used to inspect the size of modules inside an OCaml binary on Linux by analyzing its nm(1) output: https://gist.github.com/lindig/7ab6c663f7bb763322f65b7cda60f29c. It can also take two binaries. Output looks like this:

$ ~/src/tmp/size.rb _build/default/xc/xenops_xc_main.exe  
# modules in _build/default/xc/xenops_xc_main.exe (size in Kb)
Arg                                                  10.7
Arg_helper                                            3.1
Array                                                11.5
ArrayLabels                                           0.3
Ast_402                                              54.5
Ast_403                                              56.0
Ast_404                                              56.8
Ast_405                                              59.3
Ast_406                                              60.1
Ast_407                                              60.1
Ast_convenience                                       4.9
Ast_helper                                           30.3
Ast_invariants                                        2.5
Ast_iterator                                         12.5
Ast_mapper                                           36.3
Astring                                               3.1
Astring_base                                          2.2
Astring_char                                          1.5
Astring_escape                                        4.4
Astring_string                                       23.7
Astring_sub                                          21.2
Astring_unsafe                                        0.3
Attr_helper                                           0.8
B64                                                   2.7
Backtrace                                            10.3
Base                                                 10.0
Base__Applicative                                     5.1
Base__Applicative_intf                                0.5
Base__Array                                          20.6
Base__Array0                                          0.9
Base__Array_permute                                   0.4
Base__Avltree                                         8.7
Base__Backtrace                                       1.0
Base__Binary_search                                   2.4
Base__Binary_searchable                               0.6
Base__Binary_searchable_intf                          0.0
Base__Blit                                            2.3
Base__Blit_intf                                       0.0
Base__Bool                                            1.3
Base__Buffer                                          0.7
Base__Buffer_intf                                     0.0
Base__Bytes                                           3.8
Base__Bytes0                                          0.3
Base__Bytes_set_primitives                            0.0

...omitted...

Xenbus_utils                                          0.8
Xenctrl                                               1.3
Xenctrlext                                            0.0
XenguestHelper                                        6.4
Xenops_client                                         1.4
Xenops_helpers                                        0.9
Xenops_hooks                                          2.6
Xenops_interface                                    294.1
Xenops_migrate                                        4.9
Xenops_server                                       139.4
Xenops_server_plugin                                  1.8
Xenops_server_skeleton                                3.3
Xenops_server_xen                                   133.1
Xenops_task                                           3.6
Xenops_types                                        116.0
Xenops_utils                                         28.8
Xenops_xc_main                                        2.0
Xenopsd                                               8.0
Xenstore                                              3.6
Xenstore_watch                                        0.1
Xmlm                                                 41.5
Xmlrpc                                               14.0
Xs_client_unix                                       12.1
Xs_handle                                             0.9
Xs_protocol                                          18.2
Xs_transport                                          0.6
Xs_transport_unix_client                              0.9
Yojson                                              155.5
_startup                                             39.2
_system        
```
4 Likes

Could you describe the fix in more detail? We are also using ppx_deriving.

@lindig just don’t use ppx_tools_versioned.5.2 which is broken.

Just never include ppx_driver or any ppx rewriter library in the list of library dependencies. One of the commits I did to fix thix can be seen here.

/Anders

Thank you. Just to be explicit: when a PPX rewriter introduces a runtime dependency for the code it generates, has this dependency to be declared at all where the PPX rewriter is used? Or is this automatically taken care of by the PPX rewriter’s META data?

In the latter case, any mention of PPX libraries in jbuild files are likely to be wrong. From your commit I get the impression that runtime libraries need to be declared by the user of a PPX.

A small experiment with ppx_deriving suggests to me that it is enough to declare (preprocess (pps (ppx_deriving.std))) and that the runtime code is linked into the final result without having to mention it explicitly in the jbuild file. https://github.com/lindig/hello/tree/ppx
`

I think this discussion may be also relevant in this regard: https://github.com/ocaml/opam-repository/issues/11852

1 Like

This is tricky to answer, PPX rewriter themselves don’t contain any magic to add runtime dependencies to META files (that’s why we have nocrypto, tls (thx to diml), and a similar patch in x509 to include runtime dependency of ppx).

OTOH dune, as diml mentioned here, handles runtime dependencies of PPX rewriters automatically.

AFAICT, ocamlfind query uses the META files, which only list runtime dependencies, and thus no compile-time dependencies are involved. I first look into ocamlfind output, and then into the META files directly to find these dependencies.

This looks great!