Marshal determinism and stability

It would be impossible for the user to manually override and go ahead with the exchange–the Marshal module simply doesn’t understand serialized data across different versions of OCaml, as mentioned earlier. Anyway, this is a moot point; as I said earlier, there is already development work done to solve this issue in a general way going forward. See unison wire protocol depends on ocaml version · Issue #375 · bcpierce00/unison · GitHub

Oh, I meant the other way around: suppose that we just hash os-version+arch+ocaml-version+unision-version. Then Linux Unison 4.3 <-> Windows Unison 4.3 would reject, as would Unison built from Ocaml 4.10.0 <-> Ocaml 4.10.1. I’m suggesting that these cases are what the “override” is for.

But sure, it’s better to make the protocol itself aware of and supporting of version-compatibility and -incompatibility.

I feel like I don’t understand your suggestion :slight_smile: the exact cases you’re suggesting an ‘override’ for are the ones that are impossible to override and proceed with an exchange, using the Marshal module as it is today.

Heh, no worries, not exactly like I’m being clear grin. Suppose that instead of hash(os version + arch + ocaml version + Unison version), we just used hash(ocaml major version). Then you’d solve the problem today, right? All I’m saying is, exchange a version-string in a manner that doesn’t need any versioning/compatibility itself. So “hex encoded with a newline terminator”. And then, maybe add a human-readable string for use as information for the user when compatibility fails. Then you can feel free to use Marshal, knowing that the two ends are compatible.

Now, if there are cases where the automated compatibility check rejects, but a human might believe that Unison would still work, sure sure, let them override the automated check, and at that point, we’re back to today’s behaviour.

And this shouldn’t require much work, which is the key point. I mean, designing a wire protocol to support version-compatibility is not trivial.

Maybe you’re referring to biniou, which is a binary format that I created and is supported by atd in addition to json.

2 Likes

Quick notes about this approach:

  • It is used extensively in the Tezos codebase. For data exchange (in the p2p layer), for data at rest (configuration files), and for a mix of the two (serialisation of economic protocol data which is both exchanged by peers and stored on disk).
  • It is flexible in that you have great control over the representation of data and the serialisation/deserialisation procedure. There is a medium-term plan to allow even more control. For now you can decide, say, if 8 booleans are represented as one byte, 8 bytes, or 8 words (or something else altogether) (see code below).
  • Some of the responsibility for correctness rests upon your shoulders as a user. E.g., when you encode a tuple, the left element must have either a fixed length (e.g., be an int8, int32, etc., be a fixed-length string, or be a tuple of fixed-length values) or be prefixed by a length marker (which the library provides a combinator for). Most of the errors for this are raised when you declare the encoding and a few are raised when you use the encoding. I recommend writing some tests to check that your encodings accept the range of values that you are going to throw at them.
  • The library is well tested: there are tests using crowbar to check that encoding and decoding are actual inverse of each others.

Let me know if you have more questions. And in the meantime, here’s two different encodings for a tuple of 8 booleans:

(* easy-encoding, produces 8 bytes *)
let boolsas8bytes =
   tup8 bool bool bool bool bool bool bool bool

(* very-compact encoding, produces 1 byte *)
let boolsas1byte =
   conv
      (fun (b1, b2, b3, b4, b5, b6, b7, b8) ->
         let acc = 0 in
         let acc = if b1 then acc lor 0b10000000 else acc in
         let acc = if b2 then acc lor 0b01000000 else acc in
         let acc = if b3 then acc lor 0b00100000 else acc in
         …
         acc)
      (fun i ->
         let b1 = i land 0b10000000 <> 0 in
         let b1 = i land 0b01000000 <> 0 in
         let b1 = i land 0b00100000 <> 0 in
         …
         (b1, b2, b3, b4, b5, b6, b7, b8))
      uint8

In general, data-encoding is probably slower than marshal, but its strong points are:

  • it offers some type guarantees,
  • it gives you some control over the representation of the data,
  • it allows you to define representations that are easy to parse in other languages or in other versions of the same language,
  • it generates documentation about the data-representation.
2 Likes

isn’t a version, be it structured or hashed, still just a promise that may be true or may not? What if just being optimistic and providing meaningful error messages in case of failure?

That message should, as usual, contain what’s needed to resolve the issue. So a human readable version name still would be beneficial, but not mandatory.

For sure, sending a version for comparison guarantees nothing. But then, nothing can give a guarantee, right? For a “simplest solution” that improves on “nothing at all”, I’d suggest you want to send -both- a hash, and a human-readable string. But let’s start with just the human-readable string:

(1) a human-readable string might be an s-expression, perhaps?

(2) we’re going to want to canonicalize it somehow, so that meaningless differences are erased. E.g. sorting lines, removing whitespace, etc. But this might break human-readability.

(3) So maybe send hash(canon(version-string)), but also send version-string as-is.

Then when the hashes don’t match, you can print the version-string, which has not been canonicalized, so hopefully will be nice and readable.

That’s all I’m saying. And since you can escape all special chars in teh string, that’s two lines, each CRLR-terminated, that each end sends to the other end.

It’s not a great solution. But at least, it’s better than nothing, and involves very little actual work. If I were to do this, I would also add a magic number at the very beginning – a 32-bit network-order integer to version the entire protocol.

A couple of notes on Marshal, which I don’t think have been covered

  • Although the guarantee is only between identical versions of OCaml, the implementation actually goes to considerable lengths to maintain backwards compatibility (so a value written by older OCaml remains readable in newer OCaml). Our own testsuite, for example, indirectly includes a test which unmarshals a 3.12.1 value. I don’t know exactly how far back the support goes.
  • As it happens, the change which affected Unison in 4.08 was the first breaking change to Marshal since either 4.00 or 4.01. The fact that it doesn’t break often (and that the two code paths - at least at present - are small) meant I have suggested a few months back that we could in future add an additional flag in the style of Compat_32 to allow values to be written in a way which should be readable on older versions of OCaml. Indeed, it’s small enough that flags could be added for the changes in 4.08 (PR#1683) and in 4.11 (PR#8791).
  • Neither point undermines using alternative formats either for network serialisation or persistent storage, for the many reasons discussed above!
6 Likes

The question is less about displaying a nice error message to the user, than the fact that you do not control which ocaml version ships in the linux distribution used by e.g. your institution (or, you do not want to update all your computers at the same time). Distributions would take care to package with several version of Unison for compatibility, but this would end up to be all for nothing because different distributions used different compilers. Sure, in the end you find some way to get out of the dependency hell by hand (thank you linux binary compatibility or patient sysadmins).

could one get away with collecting the marshalling ocaml/C source at the 4.08 breaking change and switch accordingly, or does it go deeper than that?

It would be nice to document the backwards compatibility if the maintainers are willing to commit to it. Currently, the conscious professionals will read the documentation and might conclude that the tool is unsuitable based on the lack of compatibility promises, despite all the other advantages and whatever unofficial backwards compatibility record.

I was also looking at the diffs and thinking that surely it would not be hard to write old formats too. I think what you proposed is a great idea. Rather than a flag you might want a protocol version number, and a way to negotiate a protocol version between two programs (so a range of supported versions). Old protocol versions can become unsupported after a while if that makes maintenance easier, what is important is only that the new ocaml version can speak with other recent ocaml versions.

You wouldn’t want to do that exactly - in both cases bugs are fixed. In 4.08, the change is a forwards-compatibility alteration for multicore and in 4.11 it prevents on overflow for particularly large bigarrays. So if you chose to snapshot your application with the 4.07 marshaller then you’d never work with multicore OCaml. If we added, say, Compat_version of int * int (or, more likely, just Compat_407 and Compat_410), then two Unison processes would want to negotiate the use of those flags (by exchanging OCaml version).

Indeed, my initial suggestion on our dev list, given the history, was to commit to future versions always being able to unmarshal older versions (i.e. a docs change only). It’s relatively straightforward to write good tests for it, as well.

My intention here would be to make it possible for new OCaml to communicate with older OCaml, not easy! I’d prefer to veer away from actually exposing the marshal format version, especially for the 4.11 one, which fixes a bug rather than adds a feature. Negotiating a protocol version of the marshal format makes it sound like you’re doing something official, as opposed to saying “Oh well, if you’re that old then I’ll send values you can cope with then!” :slightly_smiling_face:

All that said, I’m not actively working on this at the moment… I just got slightly bitten as a Cygwin package maintainer in September (and may get more bitten if I end up maintaining Cygwin’s unison package)

Sure, and I’m not suggesting a way to solve the problem so that operators don’t need to intervene. Though, there is a way to minimize the interventions:

[I’m not suggesting you do this; rather, that it’s a lot less invasive than redoing the entire comms protocol; furthermore, ensuring that a comms protocol is actually compatible with future modifications is … not trivial.]

(1) add a bit of a trip-test to Unison, that can be invoked to exercise all functionality

(2) I assume Unison has a config-file? Add stanza for hashes of “conforming peer versions” – that is, a list of hashes that, if received from the other end, will be accepted and allow Unison to proceed.

(3) and finally, a way to invoke Unison with the trip-test, so that if it passes, then Unison will write the hash of the peer into the config-file. Maybe also write the human-readable version-string, for documentation.

This means that an operator who wants to run Unison between Window 4.11.0 and Linux 4.12.0 versions, would have to pay attention and run the trip-test, but afterwards could just use Unison without worrying about compatibility.

An important aspect of this discussion is that the performance achieved by marshal is actually desirable.

Contributing to the OCaml compiler is also an option, especially if what @dra27 is saying is the full story. Which would be amazing to all kinds of applications.

2 Likes

I’m just looking into Marshal as an option for a use-case, where I wonder if you would maybe agree, that it’s a good use-case for Marshal:

I want to remotely call pure functions between several instances of a unikernel (similar to what you can do in erlang). So basically it’s very similar to forks of the same executable. It’s not only the same ocaml version, but by definition the same executable. I could transmit the marshaled arguments of the function, or even better: I could create a closure with these arguments and transfer a marshaled null-ary function with its closure (seems to be supported if the binaries are identical).

Isn’t Marshal the perfect fit for this?

It is if you are sure about a couple of things:

  • You are absolutely sure that this RPC system will always be between different instances of the same executable and therefore same OCaml version
  • You don’t need to be able to inspect requests and responses over the wire e.g. for troubleshooting, you are fine with all RPC calls being totally opaque from the outside world
  • You are OK with the RPC calls being defined in the source code itself and there not being any schema of types, requests, responses, and possible errors

If you are good with all of the above then sure, go for Marshal. Otherwise I’d say Thrift is a pretty good option, it has reasonable OCaml support.

1 Like

At LexiFi we we do something to this to implement an IPC protocol, except that we avoid marshalling closures by “registering” each functional entry point and only sending an identifier for the function to call, in addition to the marshalled arguments, and then return the result in marshalled form. I think the reason to avoid marshalling closures was because it imposes a fairly strong and low-level constrain (both processes need to have the exact same code layout) which you may not be able to easily guarantee, especially with security features such as ASLR, shared libraries on different architectures, etc.

Note also that you will need to avoid extensible types, since these lose their physical identity when marshalled.

Cheers,
Nicolas

2 Likes

Yes, it is indeed the exact same unikernel executable, that would run in separate subjects of a separation kernel (Muen).

  • You don’t need to be able to inspect requests and responses over the wire e.g. for troubleshooting, you are fine with all RPC calls being totally opaque from the outside world

There is actually no wire that I could inspect, the unikernels would be connected over direct channels between the subjects. There is no possibility to wiretap anyway (by security design).

  • You are OK with the RPC calls being defined in the source code itself and there not being any schema of types, requests, responses, and possible errors

I love that I don’t have to set up an intermediate API and can just call the functions as is. My only goal is to get some kind of multicore support in a single-core world. No API needed.

If you are good with all of the above then sure, go for Marshal. Otherwise I’d say Thrift is a pretty good option, it has reasonable OCaml support.

Thanks for the confirmation. Actually, after looking a bit more into this I am amazed what is possible. I always envied Erlang for the simple way to execute arbitrary functions on other nodes, and now I realize it was always right in front of my nose. And it also checks the checksum of the executables, so it will immediately fail, if there is some incompatibility by mistake. It even works with a generic wrapper that uses type variables in order to get some kind of type safety. In fact I don’t understand why this works, but it does.

I wrote some quick PoC code, which runs this loop in the “worker mode”:

    while true do
      let f : unit -> 'a = Marshal.from_channel in_channel in
      Marshal.to_channel out_channel (f ()) [Marshal.No_sharing];
      flush out_channel;
    done

In the “main loop mode” I could easily call functions on the worker like this:

let remote in_channel out_channel f =
  Marshal.to_channel out_channel (f : unit -> 'res) [Marshal.Closures; Marshal.No_sharing];
  flush out_channel;
  (Marshal.from_channel in_channel : 'res)

let main_loop () =
  ...
  let spawn f = remote in_channel out_channel f in
  let count = spawn (fun () -> succ count) in
  Printf.printf "Received int: %d\n%!" count;
  let foo = spawn (fun () -> foo ^ "bar") in
  Printf.printf "Received string: %s\n%!" foo;
  ...

I really wonder how the worker loop can marshal the 'a type. Is there some type information stored in the marshaled function closure?

Anyway, I think this is really a cool feature. :slight_smile: