Advantage of Int63 compared to Int64?

this is probably a stupid question, but - why does Base have Int63?

the docs have a description:
https://ocaml.janestreet.com/ocaml-core/latest/doc/base/Base/Int63/index.html

The size of Int63 is always 63 bits. On a 64-bit platform it is just an int (63-bits), and on a 32-bit platform it is an int64 wrapped to respect the semantics of 63-bit integers.

Because Int63 has different representations on 32-bit and 64-bit platforms, marshalling Int63 will not work between 32-bit and 64-bit platforms – unmarshal will segfault.

but I don’t understand the implications - if running on a 64-bit platform - is there an advantage of using Int63?

For example Core_kernel’s Time_ns module is using Int63 - what is the reasoning - is it compatibility or something else?

3 Likes

There’s one more relevant comment, this time on the type t itself:

The @@immediate64 attribute is to indicate that t is implemented by a type that is immediate only on 64 bit platforms. It is currently ignored by the compiler, however we are hoping that one day it will be taken into account so that the compiler can omit caml_modify when dealing with mutable data structures holding Int63.t values.

(This comment is a little out of date, I think the compiler understands @@immediate64 as of this commit.

Basically, Int63.t is capable of representing the same set of values across platforms (int is 31 bits on 32-bit platforms, 63 on 64-bit platforms, and 32 in Js_of_ocaml). At the same time, in 64-bit native code, Int63.t is represented as an immediate value, so it doesn’t allocate heap memory nor require caml_modify (unlike int64).

7 Likes

To echo @bcc32, we recently switched from int64s to an int63 type in one of our data structures (using a separate implementation that is very similar to Base.Int63) and ended up halving its memory usage on 64-bit systems almost for free. In future, I can see myself defaulting to using int63 over int64 in almost all cases, especially for e.g. file offsets where the last bit is never going to get used anyway.

I’m very interested to see that [@@immediate64] is now supported; thanks for the pointer @bcc32! I shall add that to the implementation that we’re using as soon as possible :slight_smile:

8 Likes

recently I ported a quite elegant algorithm from Go/Asm that leverages the in-memory representation of float64 via Int64 Geohash in Golang Assembly: Lessons in absurd optimization:

Would you rather discard that elegant solution and use a computationally marginally heavier approach and use Int63? (The difference is mostly that the elegant solution saves a float->int conversion and a Float.ldexp https://mro.name/geohash/v2cbde795 )

1 Like

that’s very helpful to know! thanks

does this mean that conversions between int and Int63 are free?

is it fine to mix them, or all operations need to happen with Int63 numbers to avoid an overhead?

and how do you specify an Int63 number, for example int is 100, Int64 is 100L, does Int63 have an equivalent?

@CraigFe thanks for the pointer, I modified my implementation to Int63, too. Is the Optint module the way to go like in ~mro/geohash: lib/calc.ml - sourcehut git or did I overlook Int63 - how did you?

To give slightly more information: an OCaml value may be either a number/scalar or an address/pointer (to an OCaml block). At runtime we need a way to distinguish those two sorts to avoid conflicts (confusing the GC for example). Standard OCaml runtimes (not js_of_ocaml) use two different conflict-avoidance schemes for scalar values:

  • “tagging”, which is the use of the least significant bit to distinguish pointers (0) from constants (1). This adds a small bit of complexity to arithmetic operations, but this is relatively cheap. But it requires reserving a bit; this is why OCaml int has only 31 or 63 bits on systems with machine words of 32 or 64 bits. int, bool, char, constant algebraic-datatype constructors use tagging. (We call “immediate” the values that are tagged scalars.)
  • “boxing”, which is the use of a memory indirection, a memory block is allocated to hold the scalar data (so the OCaml value is the pointer to that block). This is typically more expensive, you need to allocate when creating values and to dereference when accessing values. float, int32, int64 and nativeint (which is either int32 or int64) use boxing, with optimizations to avoid boxing/unboxing in intermediary computations.

These two approaches have coexisted for a long time, basically since the OCaml runtime exists, and most datatypes using one or the other. Int63 is a newer and weirder, its type is tagged on 64bit architectures and boxed on 32bit architectures. This improves efficiency (compared to Int64) in the common 64bit case. (Int32 could benefit from the same treatments, but 32-bit integers are too small anyway and this would be a compatibility-breaking change.)

The @@immediate64 attribute mentioned above is related. @@immediate is an attribute that the type-system tracks to know that certain types, despite being abstract (their definition is unknown), must contain only immediate values. (This is useful for some type-directed optimizations, typically specializing polymorphic operators on runtime, etc.) @@immediate64 is a weirder variant for types that are only guaranteed to be @@immediate on 64-bits machines. @@immediate has been around for a while (4.03; released in April 2016), @immediate64 is relatively recent (4.10, released in February 202). Both were contributed to the compiler upstream by Jane Street.

13 Likes

actually I was on an old version, 0.0.5 works fine: `Int63` not implemented yet? ¡ Issue #12 ¡ mirage/optint ¡ GitHub

one more thing: How to do literals >= 2^31 like bitmask constants?

64bit installations of OCaml support large integer literals at type int, but 32bit versions of the compiler choke on them, so using them is not recommended for portability. Special literal suffixes exist for int32, int64 and Nativeint: for example 2l : int32 (or 0xFFl) and 2L : int64 (or 0xFFL) and 2n : nativeint. Int63 is not a builtin type, to my knowledge no literal syntax exist (but it may be provided by a ppx extension), but for example Int63.of_int64_trunc 0xFFL is a portable way to describe any int63 constant.

3 Likes

thanks @gasche , I didn’t realize the differences in representation between int, int32 and nativeint;

Int63.of_int64_trunc 0xFFL

this sounds good, I ended up having some common constants declared as val my_const: Int63.t

I am not sure I fully understand how tagged values are converted between each other after compilation:

on a 64bit machine: both int and Int63 would be tagged scalars. Does this mean that after compilation the conversion between these ints would be ‘free’? For example:

let sum a b = Int63.(a + (of_int b))

would this of_int ‘disappear’ or would it still be a runtime function call?

1 Like

I didn’t look at the implementation to check, but the Base.Int63 documentation claims that Int63 is just an int on 64bit platforms (and this is the natural choice), so Int63.of_int b should just return b. Whether or not there are function calls depends on whether the optimizer is able to remove the useless calls, so “it depends”, but probably it is and there are no calls remaining at runtime.

You can check this by writing the code in a foo.ml file, and then running ocamlc -dlambda foo.ml and ocamlopt -dcmm foo.ml. (“Lambda” is a higher-level intermediate representation where less optimizations have been performed.)

Edit: more likely, ocamlfind ocamlopt -package base -dcmm -c foo.ml, which gives (in addition to other things) the following, confirming that of_int is gone and + is just a primitive addition:

(function{foo.ml:2,8-36} camlFoo__sum_236 (a/238: val b/239: val)
 (+ (+ a/238 b/239) -1))
2 Likes

Thanks for all of the context and help to understand better the internals of OCaml, really appreciate the time you took to test and explain this and hope that others find this as helpful as I do

1 Like