If you never get input_value: integer too large Marshal errors at runtime you can skip this topic. If you don’t distribute bytecode across 32/64bit machines, or compile bytecode for js_of_ocaml (and wasm_of_ocaml?) with a 64-bit compiler, it is doubtful you would see that error.
I created a patch for the OCaml compiler so these errors are (theoretically) impossible to trigger. My “solution” involves a new ABI called bx32 which is 32-bit OCaml values with access to the 64-bit C heap:
(I’m not trying to upstream this. It is too esoteric. But if you are the rare person who deals with bytecode and constantly encounters the error, hopefully a Google search points you here.)
-compat-32 … once an incompatibility is found … will fail without telling you where the problem was.
For example:
(* file: x.ml *)
let () = Printf.printf "%d\n%!" 446744073709551615
gives:
# With 64-bit ocamlc this will compile fine
$ ocamlc -o x.bc x.ml
# With 64-bit ocamlc and -compat-32, it errors but is useless for finding the problem
$ ocamlc -compat-32 -o x.bc x.ml
File "x.ml", line 1:
Error: Generated bytecode unit "x.cmo" cannot be used on a 32-bit platform
Also, relying on -compat-32 means all 3rd party libraries have to be compiled with -compat-32. I’m not even sure how to do that without patching the compiler.
Sigh. I thought I tested out 63-bit integer literals with bx32 … the behavior was to fail at compile-time with a syntax error (just like the standard 32-bit ocamlc compiler) and report the position of the 63-bit integer literal … but now that behavior is gone. When I figure out the regression fix Edit: Found the bug in threshold in parse_intnat; I’ll update the patch. For now it does & 0xFFFFFFFF on the literals:
In passing, that should be doable with the just-about-documented ocaml_compiler_internal_params file - although that still wouldn’t report where the constant has come from, of course, beyond the file. I don’t think large integers arise in any other way?
Do I read correctly from the spec that the 16MiB string limit gets lifted with this - or is that only on 64-bit? (or am I misreading/understanding it)?
Large integers dont have to just come from literal constants. I spent a couple days trying to hunt for where a non-literal was occurring for me (quickly ruled out large literals since the project compiled fine with 32bit compiler), but I gave up the hunt. It was painful working backwards from a corrupted global DATA unmarshalling in caml startup to the precise value allocated in the ocamlc compiler (a different process). And not something I want my users to do themselves.
No, it should be 16MiB string limit. I vaguely recall someone at LexiFi saying they extended the string limit, but I didn’t spend time on that issue.
This sounds like a more principled way of achieving something strange that we have in our Goblint ecosystem: marshaling data from a native executable and unmarshaling it in a JSOO user interface. Of course that’s never intended to work.
However, our version of it is a massive hack. As far as I remember and understand:
We patched Zarith to only use 53bit (size of JS/JSOO ints) instead of 63bit values for small integers and switch to the general representation for anything larger. I think this is described by this student report: https://github.com/goblint/Zarith/raw/goblint/goblint/main.pdf. We might’ve gotten away with it just because we use Z.t a lot anyway and there’s hopefully few ints around.
And we have a custom marshal stub for JSOO that I think can take the native 64bit layout and read it structurally correctly. I guess it just truncates ints, but we seem to have gotten away with it.
Of course it’s all super fragile and probably easy to break if other places than Zarith actually have large int values.
I guess a full bx32 build might avoid all our troubles and get the marshaling compatibility for free.
Although to closely match what we did, some amalgamation like bx53 might be possible too, given that we don’t need compatibility between b64 and b32, but b64 and JSOO.