[ANN] Decompress 1.4.0

Greetings everyone,

I am happy to announce the new release of decompress 1.4.0, available for installation via OPAM. Decompress is a library containing a pure OCaml implementation of several compression algorithms:

  • RFC1951
  • Zlib
  • Gzip
  • LZO

It’s goal is to provide several algorithms for both the inflation and the deflation of objects, in the form of a stream API allowing to call the chosen algorithm one bit at a time. Such behavior allows for an easy use of decompress in situations where we would not be able to have the input in one go, or where we would like to output the result in a non blocking way. This new release comes with several improvements to the documentation and bug fixes, but even more, with a whole new implementation for the rfc 1951 and zlib algorithms.

Non-stream implementation for rfc 1951 and zlib

Up to this day, decompress was used in several projects like ocaml-git. However, as time passed by, it appeared that in some cases, the current implementation of decompress was not the optimal solution:
As useful as a stream implementation is, it requires to save a lot of information about the state of the compression, in order to resume it once we have enough input.

This is why, in some cases where we would be sure that we have our whole input in one go, we might want to avoid all of these side-costs, and directly go to the point.

State of the art: libdeflate

This new problematic in mind, we have started thinking about the existing implementations of these algorithms which were also bypassing the stream behavior. One implementation that proved to be a suitable example for our problem, was the library libdeflate, an implementation in C. It’s main advantages being: a better compression ratio than zlib and with faster runtime.

It was used as the solid base for the OCaml implementation provided by this new release.

OCaml version of libdeflate, performances and use cases

Inheriting the logic of libdeflate, the new implementation now has a better compression ratio, while being slightly faster at it. On the other side, the decompression is way faster, with ~33% of speed increase in most tested cases: On the book2 (from the Calgary corpus) file:

  • decompress (stream): 15 Mb/s (deflation), 76 Mb/s (inflation), ratio: 42.46 %
  • decompress (non-stream): 17 Mb/s (deflation), 105 Mb/s (inflation), ratio: 34.66 %

Now that this is in place, the users of decompress will be able to choose between the two versions, according to their needs. In the case of ocaml-git, the vast majority of the git objects are small and will be compressed in one go. This is why we updated with the new implementation when possible.

Writing optimized code and profiling it

One of the biggest concerns of this release was to be able to produce optimized code. The base code being coded in C, a lot of sub-optimal behavior where ported in the OCaml version: for and while loops, references everywhere, mixes of struct and union., it needed a lot of clean up.

This is why once the main iteration was done, we have spent several weeks profiling the code base, using the OCaml library landmarks, flamegraph or simply the linux binary perf. This work, sometimes tedious, proved to be helpful and healthy for both the harmonization of the code and it’s performances.

Decompress & MirageOS

Compression algorithms are a really important piece in many projects, and operating systems do not avoid this. decompress was coded from the start with the idea of being used in the much larger project MirageOS.

This release is another opportunity to broaden MirageOS’s reach, by providing one more algorithm to it’s stack, allowing us to specialise even more the unikernels that would have a need for inflation/deflation algorithms. This more restrictive implementation, as we need to have the whole input in one go, will allow us to take advantage of the situation and give more flexibility for the user.

The positive aspects of this release will most likely show up soon enough, as we make use of decompress to its full potential