[ANN] data-encoding.0.3: performances and streaming

Guess what?! Version 0.3 of data-encoding is now available!

Data-encoding is a library for defining encodings, which can then be used to
serialise and deserialise values to-and-from a binary or JSON representation.

let e : (string * int list) t =
  obj2 (req "name" string) (list uint8)
let v : (string * int list) =
  ("low", [0;1;2;3;4])
let s : string = Binary.to_string_exn e v
let j : Ezjson.t = Json.construct e v

Version 0.3 of the library is now available on opam.
The code is distributed under MIT license and hosted on Gitlab: Nomadic Labs / data-encoding · GitLab.
The documentation is available online: Data_encoding (data-encoding.Data_encoding)


In addition to numerous miscellaneous improvements, this version brings two
major changes.

  1. Support for streamed JSON serialisation.

    JSON serialisation for large values can be expensive. And in some context,
    prohibitively so. One such context is cooperative concurrency where
    serialising very large JSON values can block other on-going tasks.

    Data-encoding provides the necessary functions to serialise values as
    sequences of strings which can be consumed with appropriate yielding.

    let rec write seq = match seq () with
      | Seq.Nil ->
          Lwt.return_unit
      | Seq.Cons (chunk, seq) ->
          Lwt_io.write oc chunk >>= fun () ->
          Lwt.pause () >>= fun () ->
          write seq
    in
    let j = Json.construct_seq e v in
    let s = Json.string_seq_of_jsonm_lexeme_seq ~chunk_size_hint:512 j in
    write s
    
  2. Performance improvements.

    The serialisation and deserialisation of some encodings has been optimised a
    lot. On some encodings the performances have gotten competitive with Marshal.

    More details below.

7 Likes

According to the micro-benchmark below data-encoding.0.3 gets close to Marshal performances on the serialising and deserialising of Micheline values (S-EXP-like values used to represent smart-contracts on the Tezos blockchain).

The results are printed below. Notice the speed up:

  • for serialising it progressed from a 13.40× slow-down over Marshal to a 1.01× slow-down,
  • for deserialising it progressed from a 18.72× slow-down over Marshal to a 1.02× slow-down.
* data-encoding 0.2
Estimated testing time 20s (2 benchmarks x 10s). Change using '-quota'.
┌────────────────┬──────────┬────────────┬─────────────────┬──────────┬───────────┬───────────┬────────────┬─────────┐
│ Name           │ Time R^2 │   Time/Run │            95ci │  mWd/Run │  mjWd/Run │  Prom/Run │ Percentage │ Speedup │
├────────────────┼──────────┼────────────┼─────────────────┼──────────┼───────────┼───────────┼────────────┼─────────┤
│ marshal_encode │     1.00 │   102.89us │ -0.05us +0.06us │   5.74kw │     0.11w │     0.11w │      7.46% │    1.00 │
│ binary_encode  │     1.00 │ 1_378.70us │ -7.47us +8.73us │ 953.58kw │ 1_489.94w │ 1_489.94w │    100.00% │   13.40 │
└────────────────┴──────────┴────────────┴─────────────────┴──────────┴───────────┴───────────┴────────────┴─────────┘
Estimated testing time 20s (2 benchmarks x 10s). Change using '-quota'.
┌────────────────┬──────────┬────────────┬─────────────────┬──────────┬───────────┬───────────┬────────────┬─────────┐
│ Name           │ Time R^2 │   Time/Run │            95ci │  mWd/Run │  mjWd/Run │  Prom/Run │ Percentage │ Speedup │
├────────────────┼──────────┼────────────┼─────────────────┼──────────┼───────────┼───────────┼────────────┼─────────┤
│ marshal_decode │     1.00 │    73.58us │ -0.62us +0.59us │   7.59kw │     0.16w │     0.16w │      5.34% │    1.00 │
│ binary_decode  │     1.00 │ 1_377.68us │ -4.26us +4.35us │ 951.99kw │ 1_464.25w │ 1_464.25w │    100.00% │   18.72 │
└────────────────┴──────────┴────────────┴─────────────────┴──────────┴───────────┴───────────┴────────────┴─────────┘
* data-encoding 0.3
┌────────────────┬──────────┬──────────┬─────────────────┬─────────┬──────────┬──────────┬────────────┬─────────┐
│ Name           │ Time R^2 │ Time/Run │            95ci │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │ Speedup │
├────────────────┼──────────┼──────────┼─────────────────┼─────────┼──────────┼──────────┼────────────┼─────────┤
│ marshal_encode │     1.00 │ 102.05us │ -0.15us +0.18us │  5.74kw │    0.11w │    0.11w │     99.40% │    1.00 │
│ binary_encode  │     1.00 │ 102.67us │ -0.39us +0.44us │ 37.55kw │    2.55w │    2.55w │    100.00% │    1.01 │
└────────────────┴──────────┴──────────┴─────────────────┴─────────┴──────────┴──────────┴────────────┴─────────┘
Estimated testing time 20s (2 benchmarks x 10s). Change using '-quota'.
┌────────────────┬──────────┬──────────┬─────────────────┬─────────┬──────────┬──────────┬────────────┬─────────┐
│ Name           │ Time R^2 │ Time/Run │            95ci │ mWd/Run │ mjWd/Run │ Prom/Run │ Percentage │ Speedup │
├────────────────┼──────────┼──────────┼─────────────────┼─────────┼──────────┼──────────┼────────────┼─────────┤
│ marshal_decode │     1.00 │  72.21us │ -0.29us +0.33us │  7.59kw │    0.16w │    0.16w │     97.57% │    1.00 │
│ binary_decode  │     1.00 │  74.00us │ -0.29us +0.31us │ 36.48kw │    2.89w │    2.89w │    100.00% │    1.02 │
└────────────────┴──────────┴──────────┴─────────────────┴─────────┴──────────┴──────────┴────────────┴─────────┘

Do not hesitate to open an issue on the project’s issue tracker to let us know about encodings that are still too slow.

8 Likes

Maybe not so related, but I cannot resist mentioning it.
There was a paper about adding type-safety to Marshal.
It looked pretty cool:
“Typing Unmarshalling without Marshalling Types”
Henry, G., Mauny, M., Chailloux, E., & Manoury, P. (2012). Typing unmarshalling without marshalling types. ACM SIGPLAN Notices , 47 (9), 287-298.
http://michel.mauny.net/data/papers/henry-mauny-chailloux-manoury-2012.pdf

2 Likes

I would be curious to know how you got these performance improvements. Are there some big ideas that delivered large improvements? Do you have references/pointers?

1 Like

I’ll let @yurug answer more specific follow-up questions because he made that happen, but the basics perf-only changes, transparent to the end-user, are:

  • Replace linear-access list by random-access arrays where possible (required a bit more than just changing types)
  • Memoise the application of mu to avoid recomputations of the encoding during construction/desctruction
  • Use unboxed uint option where none = -1 for some internal counting

And then we also introduced a new combinator for sums (called matching) where you can pass a function that does the matching rather than rely on a list of cases. The previous combinator (union) is still available but the new one should be preferred for big case lists where performance of encoding is a concern.

You can see all the changes in https://gitlab.com/nomadic-labs/data-encoding/-/compare/0.2...0.3 and I’d recommend to check out the ones from @yurug specifically.

Note that, as mentioned above, the micro-benchmark above was run specifically on Micheline values. We have not benchmarked other scenarios, we were primarily interested in optimising the construction and destruction of these values. If you have specific examples of encodings that are slow, especially encodings that would be above 10× the time of Marshall, we’d be very happy to hear about them!