Reducing size of an opam installation and executable/library size

There seems to be considerable interest in reducing the size of an OCaml installation (judging by the various discussions/PRs on the compiler).
Here are a few observations that might help with that:

  • .opam-switch/sources is very large and contains redundant data (multiple copies of source code, unit test input data, etc.): space savings in opam switches · Issue #5448 · ocaml/opam · GitHub. Credit for spotting this initially goes to @lindig

  • .a files are quite large
    .a files are by definition uncompressed, and looking at one of the largest directories in my OCaml installation (compiler-libs), picking one .o member of that .a:

ls -lh parser.o
-rw-r--r--. 1 edwin edwin 1.4M Feb 23 21:00 parser.o

Lets try compressing the ELF file (the debug sections):

eu-elfcompress parser.o -o parser.compressed.o
-rw-r--r--. 1 edwin edwin 1.3M Feb 23 21:25 parser.compressed.o

That didn’t help too much.

zstd parser.o
parser.o             : 18.03%   (  1.32 MiB =>    244 KiB, parser.o.zst) 

This helps quite a lot.

Since these .a files are only used when linking executables through ‘ocamlopt’ itself they could themselves be stored in a compressed form (even if the linker will want an uncompressed .a file, one could be produced on demand as needed), and could lead to some space savings.

  • using -gz

This will compress ELF debug sections with zlib (and in GCC 13 with zstd). Haven’t done an experiment yet with it.

  • dead code elimination of Unix

This is quite large, and besides linking cmdliner and Unix it doesn’t have that much actual code inside it:
3.0M /var/home/edwin/.opam/4.13.1/bin/utftrip

Compressing the debuginfo helps, but is still big, and even removing all debuginfo it still stays big:

eu-elfcompress utftrip -o utftrip.compressed
-rwxr-xr-x. 1 edwin edwin 2.3M Feb 23 21:29 utftrip.compressed
strip utftrip.compressed
-rwxr-xr-x. 1 edwin edwin 1.4M Feb 23 21:30 utftrip.compressed

Looking at the executable with ‘nm -D’ there is a lot of unix and OCaml runtime code there that is very unlikely to ever be called, and some cooperation between GCC and ocamlopt might help reducing that size (I experimented with ‘-ffunction-sections’ in the past, might have to try that again).

  • Finally I know that static linking is quite core to how OCaml functions, but if reducing the size is the goal then using shared libraries is a common solution. What about .soname and ABI compatibility and all the link time checking that ocamlopt performs? I think we could have a semi-static unix.so that is still safe to use if we make its soname be the full interface+implementation hash that ocamlopt would use during linking. Then at least all executables produced with the same version of OCaml could reuse this code.
    It’d have to be opt-in because I imagine it’d break quite a few workflows if OCaml suddenly starts requiring a runtime library for executables, but in cases where lots of OCaml binaries are produced (‘opam’ switches, or distro packages containing lots of OCaml binaries) this may be beneficial.

What would be the benefits? Disk space is quite cheap nowadays, but network bandwidth is not infinite, and memory is not infinite either, especially with lots of statically linked binaries all running on the same system (if you have a lot of processes written in OCaml).

6 Likes

It seems to me that .opam-switch/sources/ is redundant. The source code for a package is coming from the cache of compressed archives (or a Git link). Obviously this archive needs to be unpacked during compilation but why is this kept around beyond that? Once compiled, the relevant files go into the hierarchy of the switch which should be enough to compile any other package depending on this package. These are interface files, cmx, cmi and o files, maybe mli are useful for documentation. What depends on the .opam-switch/sources/ and should this by definition ever happen when the install process is correct?

1 Like