I am really hesitant about the post title. Maybe it should be
How is CRC of an ocaml interface (Interfaces imported getting from ocamlobjinfo) computed?
The background is we have a project in which we shipped some pre-compiled .cmo files that can be loaded by utop. It used to work well that it’s built on macos but we deliver it also on ubuntu and windows wsl (usually the exact same version of ocaml but sometimes with variants).
But now we found it’s not true anymore for ocaml 5.1.1.
I run ocamlobjinfo ~/.opam/5.1.1/lib/ocaml/stdlib__Uchar.cmi on each platforms and also on vanilla ocaml and with flambda variant. They all give me different output.
Shot in the dark, but could it be due to the presence of libzstd on one system and it’s absence on another ?
Are the two CMO significantly different in size ? (That could be an indicator).
IIRC there was a change in 5.1 to use zstd in Marshal to compress marshalled values that is now used by default to compress ocaml build artefacts. The way the absence of libzstd is handled changed between 5.1 and 5.1.1 so maybe it’s relevant.
There is an ocaml-option-no-compression in opam to create a zstdlib less compiler.
This is indeed zstd - a8733118f9a4891e68bd3430f8176bb5 is the uncompressed digest for Stdlib__Uchar. I think this should at least be discussed on the OCaml issue tracker, if you’d be happy to open an issue?
I think it’s fundamentally a bug, but further one we shouldn’t live with - it’s entirely feasible here that a user on a single system compiles OCaml with zstd support (which is loaded from a shared library), subsequently updates zstd which starts producing slightly different output and so gets an unexpected change in digests on the same system which would affect dynamic loading of code (both in bytecode and native code). The fix I expect is simple, if a little unfortunate (I expect we have either to disable compression for .cmi files completely, marshal .cmi files twice when compression is enabled, or generalise the marshaller further to allow simultaneous compression and computation of a checksum.)
You shouldn’t be seeing a difference in .cmi digests between flambda and non-flambda - flambda digests are different, but only for .cmx and .cmxa.
That is unfortunately not the case. You can see differences in digests generated by ocamlc.byte and ocamlc.opt, because the sharing can be different, and whether flambda is enabled can also change the digests generated by ocamlc.opt. (Although I’m surprised this happens with the stdlib, which I would expect to always be compiled with ocamlc.byte.)
We could fix that by hashing the structure instead of the marshalled string, but it’s hard to find a good hash function for that (there may be cycles in the structure).
Thanks for your confirmation and explanation. I made a corresponding GitHub issue #12983 for this problem.
I didn’t make it in the first cause I was unclear if I made sth wrong. Our script is a bit outdated. We don’t use custom top-level feature in dune but just grap compiled cmo files from _build.
We hacked the problem by providing several set of cmo files.
I am a bit unclear about your solution. If a cmo is made in a switch disabling compression, then checking that file on other machines can respect their compression choices?
Switches configured with compression support can handle non-compressed artifacts without issues, so the non-compressed cmo and cmi should work everywhere. Nevertheless, this doesn’t solve the issue of mixing cmis and cmos compiled with different configurations, which means that this doesn’t really work.
Reading again, I think I might also have missed the scope of your remark: for standard library files, flambda/non-flambda should only impact .cmx and .cmxa files, and not .cmi files. It’s only for files that end up compiled with ocamlc.opt that we can see a difference (so external libraries, and internal libraries such as Unix and Str if they were compiled using ocamlc.opt).