Memory representation guarantees and Obj.magic

Hello,

I’m trying to write a wrapper around an existing type to add some type guarantees that can help catch some bugs. To simplify a bit:

type t = | Int of int | Bool of bool | List of t list

type _ t_ty = Mk : t -> 'a t_ty [@@unboxed]

let bool : bool -> bool t_ty = fun b -> Mk (Bool b)
let int : int -> int t_ty = fun i -> Mk (Int i)
...

The issue comes when trying to define a constructor for List:

let list : type a. a t_ty list -> a list t_ty =
  Mk (List (List.map (fun (Mk v) -> v) l))

Now obviously this isn’t ideal, it requires iterating over the list for no reason.
I’ve basically had the exact same problem in the past, and I know there is no well-typed solution here. And as far as I know, the work on detecting equalities in flambda has not been merged, and would require flambda anyway.

So my new question is: is the following implementation guaranteed to be correct?

let list : type a. a t_ty list -> a list t_ty =
   fun l -> Mk (List (Obj.magic l))

I have read the RWO guide on memory representation as well as the definition of unboxed but I can’t quite figure out if this is guaranteed to always be correct or if the compiler can decide to change something in the future and break this implementation.
Anybody has a definite answer?

It seems that you’re trying to implement a GADT by using phantom type and smart constructor. You don’t need the unboxed Mk constructor to do so. It’s sufficient to define 'a t_ty as:

type 'a t_ty = t

And then your list constructor is just:

let list (l : 'a t_ty list) : ('a list) t_ty = List l

I suspect the “official” answer would be that Obj isn’t really part of the language and you won’t be guaranteed anything.

:man_facepalming: :man_facepalming: :man_facepalming:
Of course this works. Well the problem is solved! I do have to write an mli by hand but I guess it’s good practice anyway

That being said, I’m still interested in the answer. Thanks to @sim642 I also used the keywords “part of the language” when googling and found this answer. I guess OCaml cannot provide any guarantee about Obj.magic because it also doesn’t provide any guarantee about the memory layout of things… But for unboxing a type I would expect some guarantees? Rust does give guarantees about std::mem::transmute, even though these guarantees aren’t really strong. But they do apply specifically to things like #[repr(transparent)] which has the exact same behaviour as [@@unboxed]

I would say so. @octachron used this trick in a recent PR on the OCaml compiler codebase, and indeed the type family is kept abstract in the .mli file. That’s what we did before the introduction of GADT.

With a GADT, we do:

type _ ty =
  | Int : int -> int ty
  | Bool : bool -> bool ty
  | List : ('a ty) list -> ('a list) ty

With phantom type we write smart constructors with the same type as the constructors of the GADT, and we keep the type abstract such that those functions are the only way to create values of that type.

1 Like

Types annotated with an explicit boxing annotation are guaranteed to be boxed according to the annotation. In other words, types annotated with [@@unboxed] are guaranteed to be unboxed (this is necessary for the C FFI).

2 Likes

So this would entail that

type t' = Mk of t [@@unboxed]

let conv : t -> t' = Obj.magic

Is always guaranteed to be sound? Or is the compiler still allowed to mess with stuff?

There’s nothing sound about Obj.magic. Assume the compiler can mess with it, even though in practice it won’t.

There are no Obj.magic function in the OCaml language.

The function is specific to the OCaml compiler implementation, and by using it you are stepping outside of any guarantees provided by the language. In theory, any new versions of the compiler may break any uses of Obj.magic. This does not happen that frequently, but this happens from time to time (the last time this has happened (due to a bug in a corner case of the type system) will be in one month for the 5.3 release).

That’s really interesting :grin:
Sorry, to clarify, I’m not really asking these questions because I need an answer, I’m just actually curious from my formal semantics person point of view.

It’s actually very interesting that Obj.magic is still evolving!

PS: I’m really excited for 5.3, thank you for your work!!!

Note that it’s not that Obj.magic is evolving, it remains the identity. But its type allows you to cast anything to anything which is not what statically typed languages are made for.

Sometimes people use it to bypass typing issues by assuming something that holds about the shape of a value in a given context. But the compiler implementation evolves and may break these assumptions.

For example let’s say in a given context an Int64.t value is boxed and for some reason you use Obj.magic to make a trick that uses the fact that it is boxed. Now comes out a new, smarter, version of the compiler which unboxes Int64.t values in this context. At that point your use of Obj.magic, which assumed a boxed value, will result in a segfault.

2 Likes

Hmmm, I agree with this, but I think @octachron suggests Obj.magic is even more unstable than this, and doesn’t even guarantee to behave as the identity function. Otherwise, one could provide guarantees about its behaviour, as it is done for Rust’s std::mem::transmute (which is the exact equivalent of Obj.magic in Rust). transmute guarantees that if a value of type T currently upholds the invariant of a type U, then transmute is not UB (though note that in Rust the notion of invariant a lot more complex than it is in OCaml).

According to another earlier reply in this thread, OCaml guarantees that types annotated with [@@unboxed] behave exactly like the value inside (for FFI reasons). If Obj.magic was defined as “always the identity on the underlying representation”, then it would be a guarantee of the OCaml language that Obj.magic-ing an “unboxed” value to its inner value and vice-versa is always sound.

So my understanding is that, even though Obj.magic is currently defined as %identity, the compiler is allowed to highjack calls to Obj.magic and change its semantics, or optimise it in a way that makes stronger assumptions?

I will let the compiler dev correct me if I’m wrong but I would say:

Not necessarily. The boxed/unboxed annotations provides you guarantees from the compiler at the FFI boundary, which can be detected (use of external). In OCaml land, I expect the compiler to do what it wants.

No, if that was the case there would be a %magic primitive.

From what I remember (but @vlaviron could confirm), it’s the difference between Sys.opaque_identity and Obj.magic, the former prohibits compiler optimisations whereas the latter can, depending on the code and architecture, make optimisations that break the assumptions we would have about the memory representation of the value.

I remember a talk by @chambart (which I can no longer find, unfortunately) on why you shouldn’t use Obj.magic (especially if you want to use flambda afterwards).

Let it make it simpler, the compiler is allowed to implement Obj.magic as

let magic _ = assert false

Once again Obj.magic is no part of the language nor the semantics of the language, and is only intended to be used by compilers of other programming languages using Ocaml as a compiler target like Rocq/Coq. In other words, any OCaml program using Obj.magic is broken from the perspective of OCaml formal semantics.

Under the hood, in all version of OCaml, Obj.magic is the identity function, but that is an implementation detail.

Yep that makes sense, this is what I understood from your previous answer. Thank you!!

I understand that providing strong guarantees about Obj.magic is undesirable, but why would generated code not need such guarantees when humans do ?

2 Likes

If you are worried about Obj.magic being implemented in a future version of Ocaml as assert false instead of identity, it is trivial to implement your own identity casting function which relies on nothing but Ocaml’s C FFI. (However that would I suspect be a pointless worry: as you yourself have recognised, it is the risk of Ocaml changing its representation of Ocaml entities in memory which is going to get you.)

In the general case yes, but [@@unboxed] seems to provide exactly the guarantee that this representation won’t change though

I think you are concerned with unboxing a single-field record or a variant with a single constructor that has a single argument and isn’t a GADT (additional rules apply to GADTs). If so, for these the manual says that on applying the unboxed attribute “a value of this type is physically equal to its argument”, which supports your view.

I am not qualified to say but as this is an efficiency matter rather than one affecting the meaning of the program (about which Obj.magic has nothing to say), my guess is that although unlikely the way in which the attribute works could change so as to break your assumption, however unlikely that may be in practice. At least that seems to be the opinion of those on this topic who know more about it than me.