A hybrid 32-bit/64-bit ABI "bx32" to avoid `input_value: integer too large`

jbeckford · July 20, 2025, 8:41pm

If you never get input_value: integer too large Marshal errors at runtime you can skip this topic. If you don’t distribute bytecode across 32/64bit machines, or compile bytecode for js_of_ocaml (and wasm_of_ocaml?) with a 64-bit compiler, it is doubtful you would see that error.

I created a patch for the OCaml compiler so these errors are (theoretically) impossible to trigger. My “solution” involves a new ABI called bx32 which is 32-bit OCaml values with access to the 64-bit C heap:

$ ocamlc -config
int_size: 31
word_size: 64
...

Technical details are at dkml-compiler/src/p/ocaml-common-4_14-a10-bx32.md at ee60d91672da53bcc727807c96b02633a39c9dc0 · diskuv/dkml-compiler · GitHub. The patch is ocaml-common-4_14-a10-bx32.patch, and the GitHub project has instructions on how to build the compiler.

(I’m not trying to upstream this. It is too esoteric. But if you are the rare person who deals with bytecode and constantly encounters the error, hopefully a Google search points you here.)

dra27 · July 21, 2025, 5:55am

Nice! I couldn’t immediately see/understand why -compat-32 wasn’t a reasonable alternative? What now happens with 63-bit integer literals?

jbeckford · July 21, 2025, 6:30am

-compat-32 … once an incompatibility is found … will fail without telling you where the problem was.

For example:

(* file: x.ml *)
let () = Printf.printf "%d\n%!" 446744073709551615

gives:

# With 64-bit ocamlc this will compile fine
$ ocamlc -o x.bc x.ml 

# With 64-bit ocamlc and -compat-32, it errors but is useless for finding the problem
$ ocamlc -compat-32 -o x.bc x.ml
File "x.ml", line 1:
Error: Generated bytecode unit "x.cmo" cannot be used on a 32-bit platform

Also, relying on -compat-32 means all 3rd party libraries have to be compiled with -compat-32. I’m not even sure how to do that without patching the compiler.

Sigh. I thought I tested out 63-bit integer literals with bx32 … the behavior was to fail at compile-time with a syntax error (just like the standard 32-bit ocamlc compiler) and report the position of the 63-bit integer literal … but now that behavior is gone. ~~When I figure out the regression fix~~ Edit: Found the bug in threshold in parse_intnat; I’ll update the patch. For now it does & 0xFFFFFFFF on the literals:

# With 64-bit ocamlc
$ ocamlc -o x.bc x.ml 
$ ocamlrun x.bc  
446744073709551615

# With bx32
$ ocamlc -o x.bc x.ml 
$ ocamlrun x.bc  
989331455

dra27 · July 21, 2025, 3:55pm

In passing, that should be doable with the just-about-documented ocaml_compiler_internal_params file - although that still wouldn’t report where the constant has come from, of course, beyond the file. I don’t think large integers arise in any other way?

Do I read correctly from the spec that the 16MiB string limit gets lifted with this - or is that only on 64-bit? (or am I misreading/understanding it)?

jbeckford · July 21, 2025, 4:08pm

The latest patch does what 32-bit ocamlc does:

$ ../dkml-compiler/_build/prefix/bin/ocamlc -config
architecture: arm64
model: default
int_size: 31
word_size: 64
system: macosx

$ ../dkml-compiler/_build/prefix/bin/ocamlc -o x.bc x.ml
File "x.ml", line 1, characters 32-50:
1 | let () = Printf.printf "%d\n%!" 446744073709551615
                                    ^^^^^^^^^^^^^^^^^^
Error: Integer literal exceeds the range of representable integers of type int

jbeckford · July 21, 2025, 5:59pm

Thanks.

Large integers dont have to just come from literal constants. I spent a couple days trying to hunt for where a non-literal was occurring for me (quickly ruled out large literals since the project compiled fine with 32bit compiler), but I gave up the hunt. It was painful working backwards from a corrupted global DATA unmarshalling in caml startup to the precise value allocated in the ocamlc compiler (a different process). And not something I want my users to do themselves.

No, it should be 16MiB string limit. I vaguely recall someone at LexiFi saying they extended the string limit, but I didn’t spend time on that issue.

sim642 · July 21, 2025, 6:55pm

This sounds like a more principled way of achieving something strange that we have in our Goblint ecosystem: marshaling data from a native executable and unmarshaling it in a JSOO user interface. Of course that’s never intended to work.

However, our version of it is a massive hack. As far as I remember and understand:

We patched Zarith to only use 53bit (size of JS/JSOO ints) instead of 63bit values for small integers and switch to the general representation for anything larger. I think this is described by this student report: https://github.com/goblint/Zarith/raw/goblint/goblint/main.pdf. We might’ve gotten away with it just because we use Z.t a lot anyway and there’s hopefully few ints around.
And we have a custom marshal stub for JSOO that I think can take the native 64bit layout and read it structurally correctly. I guess it just truncates ints, but we seem to have gotten away with it.

Of course it’s all super fragile and probably easy to break if other places than Zarith actually have large int values.

I guess a full bx32 build might avoid all our troubles and get the marshaling compatibility for free.
Although to closely match what we did, some amalgamation like bx53 might be possible too, given that we don’t need compatibility between b64 and b32, but b64 and JSOO.

Topic		Replies	Views
Unboxed Int32.t on 64-bit? Ecosystem	22	1888	September 9, 2021
Advantage of Int63 compared to Int64? Ecosystem core , base	13	3401	March 16, 2021
ERROR while compiling fmt.0.8.7 Learning opam , mirageos	3	831	September 13, 2019
32-bit native code support for OCaml 5+ Community ocaml , machine-learning	21	1397	July 12, 2023
OCaml 64bits on 32bits platforms Community compiler , runtime , 32bits	4	1076	October 14, 2021

A hybrid 32-bit/64-bit ABI "bx32" to avoid `input_value: integer too large`

Related topics