Marshal determinism and stability

The problem isn’t at that level. More details in the Marshal module doc–programs compiled with different OCaml versions can’t understand each others’ Marshal serialized data.

Right, so what I’m suggesting is, during session-setup, use messages that are not serialized using Marshal. And at a minimum, one might send the hash of a canonicalized string containing OS version, arch, OCaml version, etc. Both sides check that the hash matches their own (or better, send the string) and if not, reject and stop.

Then and only then, start the actual protocol, and there, sure, use Marshal.

An analogous thing is done in some RPC systems – session-negotiation involves sending “callstream version” information, to ensure that the RPC runtimes on both ends are compatible.

That still leaves the issue I mentioned. Think about it–you have Unison on a Windows laptop and another Unison on a Linux desktop. Unless they’re the same version and compiled with the same OCaml version, they can’t exchange files. This is still a problem even if it has a nice hash system that does a version check. And it doesn’t matter if the version is ‘better’–it won’t work unless it’s exactly the same.

OK, yes, this “version” of the algorithm requires a perfect version-match. We -can- do better, and that better involves a design of a version-string specification and contract for up/down-compatibility.[1] For example, the OCaml development team makes guarantees about up/down compatibility of AST versions – minor releases will not modify the AST type. Similar assumptions/guarantees could be made/used. Of course, this involves work, whereas what I suggested didn’t. For instance, if each end sends a hash + human-readable version-string, then the ends can compare hashes, and when they don’t match, print out the human-readable version-strings (which don’t need to be canonicalized, hence can be more human-readable) for the invoker to decide if they want to override.

I was making a -cheap- suggestion for how to improve Unison: a suggestion that didn’t involve a lot of design, but would catch many of the obvious error-cases. Specifically, when the two ends reject, it gives the human invoker a chance to have a look and decide if it’s OK to proceed. Which automatically means that the human is paying attention when/if things go awry.

[1] every step we make towards a structured version-string increases complexity and pushes towards just going with some RPC system/IDL-compiler that already has this sort of thing built-in, or is OCaml-version-independent. But that’s a much bigger cost than just exchanging/comparing version-hashes.

If one were going to do this, one would (of course) want to prepend a “protocol version” to the beginning of the stream, so that one could decide later to change to a different scheme. IIRC Thrift has something like this – you can version types, but you can also version the protocol framing. Might be worth looking at that to see how they did it.

It would be impossible for the user to manually override and go ahead with the exchange–the Marshal module simply doesn’t understand serialized data across different versions of OCaml, as mentioned earlier. Anyway, this is a moot point; as I said earlier, there is already development work done to solve this issue in a general way going forward. See unison wire protocol depends on ocaml version · Issue #375 · bcpierce00/unison · GitHub

Oh, I meant the other way around: suppose that we just hash os-version+arch+ocaml-version+unision-version. Then Linux Unison 4.3 <-> Windows Unison 4.3 would reject, as would Unison built from Ocaml 4.10.0 <-> Ocaml 4.10.1. I’m suggesting that these cases are what the “override” is for.

But sure, it’s better to make the protocol itself aware of and supporting of version-compatibility and -incompatibility.

I feel like I don’t understand your suggestion :slight_smile: the exact cases you’re suggesting an ‘override’ for are the ones that are impossible to override and proceed with an exchange, using the Marshal module as it is today.

Heh, no worries, not exactly like I’m being clear grin. Suppose that instead of hash(os version + arch + ocaml version + Unison version), we just used hash(ocaml major version). Then you’d solve the problem today, right? All I’m saying is, exchange a version-string in a manner that doesn’t need any versioning/compatibility itself. So “hex encoded with a newline terminator”. And then, maybe add a human-readable string for use as information for the user when compatibility fails. Then you can feel free to use Marshal, knowing that the two ends are compatible.

Now, if there are cases where the automated compatibility check rejects, but a human might believe that Unison would still work, sure sure, let them override the automated check, and at that point, we’re back to today’s behaviour.

And this shouldn’t require much work, which is the key point. I mean, designing a wire protocol to support version-compatibility is not trivial.

Maybe you’re referring to biniou, which is a binary format that I created and is supported by atd in addition to json.

2 Likes

Quick notes about this approach:

  • It is used extensively in the Tezos codebase. For data exchange (in the p2p layer), for data at rest (configuration files), and for a mix of the two (serialisation of economic protocol data which is both exchanged by peers and stored on disk).
  • It is flexible in that you have great control over the representation of data and the serialisation/deserialisation procedure. There is a medium-term plan to allow even more control. For now you can decide, say, if 8 booleans are represented as one byte, 8 bytes, or 8 words (or something else altogether) (see code below).
  • Some of the responsibility for correctness rests upon your shoulders as a user. E.g., when you encode a tuple, the left element must have either a fixed length (e.g., be an int8, int32, etc., be a fixed-length string, or be a tuple of fixed-length values) or be prefixed by a length marker (which the library provides a combinator for). Most of the errors for this are raised when you declare the encoding and a few are raised when you use the encoding. I recommend writing some tests to check that your encodings accept the range of values that you are going to throw at them.
  • The library is well tested: there are tests using crowbar to check that encoding and decoding are actual inverse of each others.

Let me know if you have more questions. And in the meantime, here’s two different encodings for a tuple of 8 booleans:

(* easy-encoding, produces 8 bytes *)
let boolsas8bytes =
   tup8 bool bool bool bool bool bool bool bool

(* very-compact encoding, produces 1 byte *)
let boolsas1byte =
   conv
      (fun (b1, b2, b3, b4, b5, b6, b7, b8) ->
         let acc = 0 in
         let acc = if b1 then acc lor 0b10000000 else acc in
         let acc = if b2 then acc lor 0b01000000 else acc in
         let acc = if b3 then acc lor 0b00100000 else acc in
         …
         acc)
      (fun i ->
         let b1 = i land 0b10000000 <> 0 in
         let b1 = i land 0b01000000 <> 0 in
         let b1 = i land 0b00100000 <> 0 in
         …
         (b1, b2, b3, b4, b5, b6, b7, b8))
      uint8

In general, data-encoding is probably slower than marshal, but its strong points are:

  • it offers some type guarantees,
  • it gives you some control over the representation of the data,
  • it allows you to define representations that are easy to parse in other languages or in other versions of the same language,
  • it generates documentation about the data-representation.
2 Likes

isn’t a version, be it structured or hashed, still just a promise that may be true or may not? What if just being optimistic and providing meaningful error messages in case of failure?

That message should, as usual, contain what’s needed to resolve the issue. So a human readable version name still would be beneficial, but not mandatory.

For sure, sending a version for comparison guarantees nothing. But then, nothing can give a guarantee, right? For a “simplest solution” that improves on “nothing at all”, I’d suggest you want to send -both- a hash, and a human-readable string. But let’s start with just the human-readable string:

(1) a human-readable string might be an s-expression, perhaps?

(2) we’re going to want to canonicalize it somehow, so that meaningless differences are erased. E.g. sorting lines, removing whitespace, etc. But this might break human-readability.

(3) So maybe send hash(canon(version-string)), but also send version-string as-is.

Then when the hashes don’t match, you can print the version-string, which has not been canonicalized, so hopefully will be nice and readable.

That’s all I’m saying. And since you can escape all special chars in teh string, that’s two lines, each CRLR-terminated, that each end sends to the other end.

It’s not a great solution. But at least, it’s better than nothing, and involves very little actual work. If I were to do this, I would also add a magic number at the very beginning – a 32-bit network-order integer to version the entire protocol.

A couple of notes on Marshal, which I don’t think have been covered

  • Although the guarantee is only between identical versions of OCaml, the implementation actually goes to considerable lengths to maintain backwards compatibility (so a value written by older OCaml remains readable in newer OCaml). Our own testsuite, for example, indirectly includes a test which unmarshals a 3.12.1 value. I don’t know exactly how far back the support goes.
  • As it happens, the change which affected Unison in 4.08 was the first breaking change to Marshal since either 4.00 or 4.01. The fact that it doesn’t break often (and that the two code paths - at least at present - are small) meant I have suggested a few months back that we could in future add an additional flag in the style of Compat_32 to allow values to be written in a way which should be readable on older versions of OCaml. Indeed, it’s small enough that flags could be added for the changes in 4.08 (PR#1683) and in 4.11 (PR#8791).
  • Neither point undermines using alternative formats either for network serialisation or persistent storage, for the many reasons discussed above!
6 Likes

The question is less about displaying a nice error message to the user, than the fact that you do not control which ocaml version ships in the linux distribution used by e.g. your institution (or, you do not want to update all your computers at the same time). Distributions would take care to package with several version of Unison for compatibility, but this would end up to be all for nothing because different distributions used different compilers. Sure, in the end you find some way to get out of the dependency hell by hand (thank you linux binary compatibility or patient sysadmins).

could one get away with collecting the marshalling ocaml/C source at the 4.08 breaking change and switch accordingly, or does it go deeper than that?

It would be nice to document the backwards compatibility if the maintainers are willing to commit to it. Currently, the conscious professionals will read the documentation and might conclude that the tool is unsuitable based on the lack of compatibility promises, despite all the other advantages and whatever unofficial backwards compatibility record.

I was also looking at the diffs and thinking that surely it would not be hard to write old formats too. I think what you proposed is a great idea. Rather than a flag you might want a protocol version number, and a way to negotiate a protocol version between two programs (so a range of supported versions). Old protocol versions can become unsupported after a while if that makes maintenance easier, what is important is only that the new ocaml version can speak with other recent ocaml versions.

You wouldn’t want to do that exactly - in both cases bugs are fixed. In 4.08, the change is a forwards-compatibility alteration for multicore and in 4.11 it prevents on overflow for particularly large bigarrays. So if you chose to snapshot your application with the 4.07 marshaller then you’d never work with multicore OCaml. If we added, say, Compat_version of int * int (or, more likely, just Compat_407 and Compat_410), then two Unison processes would want to negotiate the use of those flags (by exchanging OCaml version).

Indeed, my initial suggestion on our dev list, given the history, was to commit to future versions always being able to unmarshal older versions (i.e. a docs change only). It’s relatively straightforward to write good tests for it, as well.

My intention here would be to make it possible for new OCaml to communicate with older OCaml, not easy! I’d prefer to veer away from actually exposing the marshal format version, especially for the 4.11 one, which fixes a bug rather than adds a feature. Negotiating a protocol version of the marshal format makes it sound like you’re doing something official, as opposed to saying “Oh well, if you’re that old then I’ll send values you can cope with then!” :slightly_smiling_face:

All that said, I’m not actively working on this at the moment… I just got slightly bitten as a Cygwin package maintainer in September (and may get more bitten if I end up maintaining Cygwin’s unison package)

Sure, and I’m not suggesting a way to solve the problem so that operators don’t need to intervene. Though, there is a way to minimize the interventions:

[I’m not suggesting you do this; rather, that it’s a lot less invasive than redoing the entire comms protocol; furthermore, ensuring that a comms protocol is actually compatible with future modifications is … not trivial.]

(1) add a bit of a trip-test to Unison, that can be invoked to exercise all functionality

(2) I assume Unison has a config-file? Add stanza for hashes of “conforming peer versions” – that is, a list of hashes that, if received from the other end, will be accepted and allow Unison to proceed.

(3) and finally, a way to invoke Unison with the trip-test, so that if it passes, then Unison will write the hash of the peer into the config-file. Maybe also write the human-readable version-string, for documentation.

This means that an operator who wants to run Unison between Window 4.11.0 and Linux 4.12.0 versions, would have to pay attention and run the trip-test, but afterwards could just use Unison without worrying about compatibility.

An important aspect of this discussion is that the performance achieved by marshal is actually desirable.

Contributing to the OCaml compiler is also an option, especially if what @dra27 is saying is the full story. Which would be amazing to all kinds of applications.

2 Likes