Checking a variant's tag without a match?

Very cool! I have to say I’m still uncomfortable with writing ppx extensions (though using them is quite straightforward!) — I wish the documentation was a lot better.

The definition of (==) makes almost no guarantee when comparing immutable things. The only guarantee is that if the things are physically equal (==) they will also be equal (=). Since functions can’t be compared for equality there’s no way to even verify this. If you’re very knowledgeable about the implementation and willing to write code that could stop working in a future revision of OCaml, you might get away with using (==) to compare functions.

So I’m curious why that’s true. (I understand that it is true, but I’d like to understand why. Naively I would expect such things to be represented as addresses in memory that don’t move around much…)

You want to let the compiler do any kind of duplication and merging that makes sense for immutable values. So you don’t want to make any guarantees about their identity.

In the part of the language without mutation (the “pure” subset) you don’t really want a notion of identity. It would be like asking whether the two mathematical sets {2, 3} and {2, 3} are the same set or are two different sets that just happen to be equal. It doesn’t seem like a helpful distinction.

Sure, but I suppose I don’t understand what sort of optimization would be gained from (say) duplicating the code of a compiled function in multiple places. Regardless, though, this is all a side issue.

When you inline a function you are doing exactly this.

Sure, but you can’t pass an inlined function to another function, either.

The benefits of duplication need not be an optimisation.

OCaml includes operations like + that are primitives but behave just like functions (for example, they can be partially applied). One particularly simple way to implement this is to have a pass that replaces any mention of + that is not a saturated application with the eta-expanded primitive, eg (fun a b -> a + b), essentially duplicating the definition of +.

Some traces of this can be observed in the top level:

# (+) == (+);;
- : bool = false

(Note that this behaviour is not guaranteed. It might change under some future version of OCaml, or if you compile this code fragment with flambda, or something like that.)

There are also optimisations that depend directly on immutable values lacking identity, such as compiling closed functions as constants. If functions had identity, it would be necessary to construct a fresh value each time a fun was evaluated.

You mean that we can compare bitwise two functions, and, obviously, if they are bitwise equal then they are equal? Well, if you will devise your comparison operator in such way, it will be of some usability of course, but it wouldn’t be an equality function because the equal operation requires the following property:

not (a = b) iff a <> b

That’s called decidable equality and it is a property that sort of “goes without saying”, i.e., it is a part of our working intuition (the intuition that you use when you’re actually using the equality operator, as you don’t usually read the docstring of this operator). Once the decidability is broken, many algorithms will break, i.e., List.mem will no longer be able to provide a definitive answer, as it will now tell that a value is either in a list, or maybe not in a list.

To put it in a different perspective, such implementation of a comparison will provide a lower (safe, sound approximation of equality), but its negation will result in the upper (unsafe, unsound approximation of equality), and since people usually rely on the property that the equality operator is precise (i.e., decidable) there algorithms will break.

2 Likes

Okay, this is the first description that makes sense to me. Yes, you break the fact that both equality and its negation are decidable and total functions and that one is the strict negation of the other.

I’d seen the Obj module before, but the documentation for it is really scant. There’s no documentation on most of the functions (none on repr or tag), and the only comment for the whole thing is “Operations on internal representations of values. Not for the casual user.”

The documentation is not scant, it is utterly complete! Like the location of the Isla de Muerta, this module makes complete sense only to those who already know its meaning.

Even the subsequent correction to same_constructor has failed to take Obj.no_scan_tag into account. There are so many examples of where this function makes no sense:

type t = ..
type t += A | B
assert (same_constructor A B = false);;

exception Foo
exception Bar
assert (same_constructor Foo Bar = false);;

(* Fixable if you at least limit the tag comparison to < Obj.no_scan_tag *)
assert (same_constructor "foo" "bar" = false);;
assert (same_constructor (fun () -> false) (fun () -> true)

The very curious reader is invited to study Chapter 20 of the OCaml manual in painful detail and, having studied and fully understood it, then never use the Obj module :wink:

3 Likes

I think that we should add exactly this to the documentation of Obj module :slight_smile: