Curiosity: boolean being represented as 1 or 3?

Hello,

I was discussing a specific change in the code with a colleague, and specifically the use of polymorphic equality. In the following code, I wrote function f and they suggested to simplify to function g.

type t =
    | Freed
    | Alive of string


let f = function
    | Freed -> true 
    | Alive _ -> false

let g x = x = Freed

I was fairly confident that f was the best way of writing this, as I’ve seen many arguments, and I have personally observed that polymorphic equality is often much less efficient.
I went to check into Godbolt, and found out that f is, in fact, more efficient.

What threw me off, however, is that both f and gseem to either return 1 or 3, while I would expect it to either return 0 or 1 since that is the representation of booleans:

f:
    andq    $1, %rax
    leaq    1(%rax,%rax), %rax
    ret

I’m just curious as to why this is the case?

The first bit is used to distinguish immediate values from pointers. This is why int’s are 63 bit and not 64. There is a real world ocaml chapter on this.

Edit: I didn’t notice that you actually linked the chapter in question already, it’s just above the part you read. The wording is a bit confusing admittedly.

1 Like

OCaml booleans are encoded as OCaml integers 0 and 1, not machine integers 0 and 1. OCaml integers, in turn, are encoded as machine integers under the encoding n -> 2*n + 1, so that their machine representation always ends with a set bit. This is so that the GC can distinguish integers from pointers and this is why OCaml integers have one bit less available than machine integers (ie 63 bits on 64-bit architecture).

Makes sense?

Cheers,
Nicolas

PS: I recommend A beginners guide to OCaml internals | Richard WM Jones for more details on this.

3 Likes

Ah of course! I somehow understood “encoded as integers” as “encoded as machine integers” but it definitely makes sense that it should be OCaml integers :person_facepalming:

Thank you!

Regarding the efficiency of the comparison: In this specific case, the first function is testing for something entirely different than what is apparent from the code.
function | Freed -> true | Alive _ -> false
is equivalent to Obj.is_int, because Freed is the only constant constructor.
You’ll see this when adding a third constructor, e.g. | Third – the code will no longer be as efficient.

That being said, it is definitely preferable for program maintenance to always use the match/function variant of the comparison

Yep, that part I got. But it does confirm that pattern matching is, in general, able to optimise code better than polymorphic equality.