[ANN] Decompress 0.7

Hi all,

I’m happy to announce a new release of decompress.0.7 available for installation via OPAM.

Decompress is a library which provide an implementation of zlib in pure OCaml. This library provides the same functions as camlzip (which is a binding with the official implementation of zlib) and a more low-level API to keep the control about the memory allocation.

A little example with an explanation is provided here to understand how to use decompress.

The goal of decompress is to provide a good zlib implementation which can be compiled (by the ocaml compiler way) to some exotic back-end and keep the same behaviour. In this way, decompress is one key on the Mirage OS project. Obviously, decompress can be compiled by js_of_ocaml (tested on the last ICFP to simulate a Git repository in your web browser).

This release is much more about fixed bug and a good example of how to use AFL. We fixed 2 bugs:

  • eaeb1de the first deflate algorithm (which uses the CPS style but appears only when the input is less than 12 bytes)
  • a6f6b2b a variable name overlap in the dictionary inflate algorithm

About performance, we already did some works with @yallop and @samoht (defuntorization, immutable state, etc.) in the 0.5 release. And, with landmarks, the ADLER-32 checksum seems to be the biggest bottle-neck.

Finally, decompress is used by some projects like datakit or ocaml-git and we can consider (from the 0.4 release) a stable API - this new version exports the ADLER-32 implementation only. A package for npm is in the pipe with this current version.

The next plan now is to focus decompress on the performance. Then, a plan to implement gzip could be interesting.

13 Likes

When saying about “performance … immutable state”, did you have to introduce immutable state or remove it to increase performance?

With some benchmarks (available in some PR, #20, #18, #17, #16, #15, #13 and #11), we decided to switch the state Decompress.{Inflate,Deflate}.t to an immutable state (which does not contain any mutable field).

This change is come from the implementation way of ocaml-tls which uses an immutable state too and a great explanation of @yallop about mutable value, caml_modify and benchmarking - however, I did not find the explanation link, my bad.

So, decompress uses the CPS style with an immutable state to inflate or deflate a stream. However, for a large input (and this is an advise of @samoht), we switch to an imperative algorithm (a big loop which is optimized nicely by the OCaml compiler) and in the end, come back to the CPS style (cf. inffast).

So, it’s an hybrid implementation (like OCaml is an hybrid language between functionnal and imperative way). I think, we can do something better but we need to describe precisely a good benchmark first (to compare with zlib) - indeed, try to compare zlib and decompress in a naive way could be a non-sense by the weight of the caml runtime.

Then, before, decompress was a functor to describe buffers but in the real world, we have only 2 kinds of buffer (bigstring and bytes), so we prefer to use an (G)ADT to allow the compiler to specialize the output code. Again, it’s an empirical choice and, with benchmarking, we need to look precisely the generated assembler code.

Obviously, all of these plans (and the optimizations things in general) is not easy when you want to keep a common implementation - a deep specialization of the code could break the behaviour for some specific back-end.

2 Likes

Sorry if it is the wrong place but as the announce talks about js_of_ocaml, I’ll ask my question here.

I have an application fully running in a web browser thanks to js_of_ocaml that generates several stuff users may want to “save on disk” as files. They can do so file by file but, of course, they always ask for a “download all” option. I cannot think about any better way to do so than generating an archive and offering to download it. Is there a better way?

Seeking for an OCaml library to generate archive (I don’t really care about compression, generated files are small) that compiles in javascript, decompress looked indeed to be my best hope but, if I’m not mistaken, it only provides the Zlib module of camlzip whereas I would need the Zip module. Have I seen correctly and do you know where I could find what I’m looking for?

You are right. decompress provides only the zlib implementation (which deflate/inflate a flow). gzip could be a solution but it’s not my plan for the next release (as I said, it’s benchmarking). However, someone implemented the gzip layer (because it’s just a layer in top of zlib) but on a previous (not recommended) version of decompress.

So, my best advise is to wait my implementation of gzip in decompress :blush: if you want a fully implementation of gzip in OCaml available in the javascript world by js_of_ocaml. In other side, the JavaScript world should provide a gzip implementation with bindings but I’m not an expert of JavaScript.

:slight_smile: Thank you for your answer. I indeed started to create a binding of jszip but I never could understood how to write correctly js bindings so I gave up for now…

I’m not an expert on the subject but as far as I understood, Gzip won’t save me (at least Gzip module of camlzip doesn’t) as gzip still deals with one file (that’s why we do tar.gz when we need an archive)… I looked at ocaml-tar by the way during my search but it still rely on an IO.input_channel that can I don’t know how to provide in javascript!