Source-to-image OCaml builder

Hello everybody,

I’ve made a first attempt at building an OCaml application into a Docker image using Openshift’s source-to-image.

The result is kinda messy because of my poor understanding of the OCaml building and deploying process, but I think it could become promising if I keep digging.

The builder image is available here, and, to try it, here’s a very simple echo server.

In particular, here is the problem I’m facing (there’s a list in my README): he resulting image is far too big. As of now, my echo server on CentOS 7 weighs no less than 1.4GB! :fearful:

This is due to the fact that the whole building environment is shipped with the image. This paragraph gives pointers to handle compiled langages with source-to-image, but I have no idea how to isolate the executable (and its possible runtime dependencies).

Any pointers to resources about this particular topic would be greatly appreciated. Besides, any feedback on this draft or this project is most welcome. :slight_smile:

Perhaps I am doing something wrong, but I follow (what I think is) a simple process, using the Alpine Linux images. There are no “external” tools.

  1. Make a base Alpine Docker build image. Install C compiler, OPAM, and some basic OPAM packages that you think everything will need (ocamlfind, core, etc). The goal here is to reuse this image as much as possible for building your various applications, and not have to pay building it all again. My current base/starter build image weighs in at 654MB.
  2. Make the build image for your application FROM the base image, alternatively start the base build image and run an interactive container. Install any needed OPAM packages, -devel system packages in it, etc. If not building interactively, steps 2-4 correspond to one Dockerfile and one docker run command.
  3. Build the application in the container, generating native code.
  4. Mount a volume, and copy out the generated executable. If your application is an echo server, that should be all you need to extract here. Many OCaml applications will be just a single executable file.
  5. Create the deployment image: start with plain Alpine (not your base build image), and COPY the executable into it. Try to guess the runtime system package dependences. This is often the -devel packages you had to install in #2, without the -devel suffixes. Then, test. If you get the dependencies wrong, adjust the deployment Dockerfile until they are right.

This makes images on the order of 10MB in size, usually. You’re welcome to ask about the details of the parts of this process :slight_smile:

1 Like

Yes, this is more or less the workflow given by Openshift in the documentation, and this is what I intend to do.

When taking a closer look at the docs, I found this which would allow me to do exaclty that.

I’m going to give it a shot, but I still wonder how I can non-interactively locate the resulting executable.

Latest versions of Docker come with “multi-stage” builds, e.g. you don’t need @antron’s point 4. and 5. See for instance how we do with DataKit: https://github.com/moby/datakit/blob/master/Dockerfile

At the end of that file, there is a new FROM keyword which generate a fresh “stage” where we copy the binary that we want to deploy.

1 Like

Oh, this is very interesting, thanks a lot!

In the meantime, I’ve managed to use to use S2I’s feature I mentioned earlier (which does pretty much the same thing as the multi-stage builds) and my final image now weighs 15MB. :slight_smile:

In case you want to take a closer look, my builder image is here and the runner image is here.

samoht https://discuss.ocaml.org/u/samoht
June 6

Latest versions of Docker come with “multi-stage” builds, e.g. you don’t
need @antron https://discuss.ocaml.org/u/antron’s point 4. and 5. See
for instance how we do with DataKit: Moby · GitHub
datakit/blob/master/Dockerfile

Interesting!
I can’t find documentation for that --from=0 option, how new is it?

17.05, see https://docs.docker.com/engine/userguide/eng-image/multistage-build/

1 Like

According to the docs, it’s from 17.05. :slight_smile:

In case you’re interested, I’ve made a writeup about OCaml and Docker multi-stage builds :slightly_smiling_face: You can see it here:

1 Like

You are maybe interested into Habitat, since it can export to Docker and offers a easier way to build packages: https://opensolitude.com/2017/03/08/build-docker-containers-from-scratch-with-habitat.html#why-build-docker-images-with-habitat

Make a base Alpine Docker build image. Install C compiler, OPAM, and some basic OPAM packages that you think everything will need (ocamlfind, core, etc). The goal here is to reuse this image as much as possible for building your various applications, and not have to pay building it all again. My current base/starter build image weighs in at 654MB.

I’d just like to point out that there are a whole load of official ocaml/opam images here:

https://hub.docker.com/r/ocaml/opam/

For example, the image you suggest is available with

docker pull ocaml/opam:alpine-3.6_ocaml-4.05.0

You should be able to find most common combinations of distribution and OCaml versions there (thanks to @avsm for setting all this up!).

(and as others have said, you can use nested builds to avoid the copying steps)

4 Likes

These images are what we’re using. Very convenient.

A possible downside to keep in mind is that the opam images use a static clone of opam-repository so opam update doesn’t do anything really, it does not get new packages. You have to git pull the repository yourself and it doesn’t use the nice OPAM CDN, instead it uses the URLs from the opam files directly.

I don’t know if this is by design or something that could be changed.

I think the idea is that a particular Docker base image will always give you the same build, and you should update the base image instead (you can specify a base with e.g. FROM ocaml/opam@sha256:5e8...). Sometimes you need a package that’s newer than the latest build, and for that some of my Dockerfiles use e.g.

RUN cd opam-repository && git fetch && git reset --hard 3cad9b6baa95451f294008d0b791c2b0d54b0968 && opam update

BTW, what is the “OPAM CDN”? Google didn’t turn up anything obvious. A mirror for package archives would be very useful though, especially with the recent GitHub instability (it’s not too bad, but you notice it more with CI jobs that are continually reinstalling things).

I understand that, but then it could also just include an opam init’ed and update’d with the opam-repository at the time of building the image. Therefore if you want to have it reproducible you would just not opam update in your Dockerfile, but if you wanted you could still do it and it would not just update from your stale git clone inside the Docker image (thus not actually updating anything).

When I install things via OPAM, it doesn’t download the packages from GitHub or their respective source locations but rather from https://opam.ocaml.org/2.0/cache/ (the URL is a bit different for OPAM 1.2.2, but I don’t have that one at hand), so I was assuming that this was handled by some kind of CDN since it is pretty fast and mostly reliable. This is one of the big advantages I see when not using a local git checkout of opam-repository.

I don’t know if there’s any CDN involved. opam.ocaml.org does have a mirror of all or most of the source archives for the packages. You can also create a mirror for yourself using the opam admin command if you’d like. I don’t recall the exact steps, but it’s effectively: (1) clone the repo; (2) run opam admin make from the checkout (will take a while as you’re downloading all of the source files now!); (3) enjoy your local opam mirror. Or you can create a partial mirror - see opam admin make --help for more information.