Understanding the Obj library

Hello everyone. It’s been a while since I discovered programming in Ocaml but it’s only recently that I started to get interested in the Obj library and in particular Obj.magic without really understanding what it was for or how it worked (for me it’s just a way to avoid type checking during compilation). The scant documentation on this library doesn’t allow me to provide answers to my questions. So I’d like to ask you a few things:

  • What is Obj really used for, and more specifically Obj.magic and Obj.tag?

  • Do you know of any articles, courses or books I could consult to learn more about this library?

  • If: let x = Obj.magic (2);; x^1;; returns 2, why does let q = Obj.magic([|1;2|]); q^1;; return a value like: int = 1122539155440? is this some kind of pointer?

Thanks in advance.
Rascar.

  • Do you know of any articles, courses or books I could consult to learn more about this library?

To learn a bit more about why the Obj module is useful, reading the “Memory Representation Values” chapter of Real World OCaml might help.

  • What is Obj really used for, and more specifically Obj.magic and Obj.tag?

Obj is mostly used for the introspection of the memory representation of values. tag, for example, returns what type of value is being pointed to. As you have noticed, Obj.magic can kind of be thought of as an escape hatch from the type system to allow you to unsafely “cast” values to disparate types. There are some instances where this is useful (for example, downcasting an object), but in general I would avoid using it.

  • If: let x = Obj.magic (2);; x^1;; returns 2, why does let q = Obj.magic([|1;2|]); q^1;; return a value like: int = 1122539155440? is this some kind of pointer?

In short, basically yes – although I’m not quite sure what you’re doing with the ^ operator. Did you redefine it to mean something else? It’s defined as string concatenation in the standard library, which would produce a type error when the RHS is an integer. The subsection in the above linked chapter titled “Distinguishing Integers and Pointers at Runtime” should shed some more light on the different results you discovered.

2 Likes

Obj is not part of the OCaml language and is not meant to be used by the general public. This module can be used to access the representation of OCaml values at runtime, but is fully and wholly unsafe: it is very easy to either cause an immediate segfault or break subtle invariants that will cause a segfault at some indeterminate future point in time. It is used by the compiler to implement certain runtime operations but, again, it is not meant to be used for general programming, as it can easily break the soundness of the language.

Or, said more colorfully:

https://sympa.inria.fr/sympa/arc/caml-list/2005-07/msg00223.html

You may find the following article interesting:

Cheers,
Nicolas

1 Like

I think that is on purpose to discourage people from using it.
It starts with a warning:

Not for the casual user

If you browse the mailing list archives you’ll also find another warning https://sympa.inria.fr/sympa/arc/caml-list/2009-10/msg00181.html

Repeat after me: “Obj.magic is not part of the OCaml language”.

There is a further warning if you look at the flambda part of the manual: OCaml - Optimisation with Flambda

The behaviour of the Flambda simplification pass means that certain unsafe operations, which may without Flambda or when using previous versions of the compiler be safe, must not be used. This specifically refers to functions found in the Obj module.

OCaml is a memory safe language, however using Obj.magic can very easily cause your program to segfault even if you think that it should be safe in certain usages. Or it may work today, but break when the next version of OCaml is released that generates slightly different code or optimizes code differently.

For example converting an integer type to a non-integer type with ‘Obj.magic’ even if you never actually read from it (e.g. because you have an if that guards it) can crash your program if the garbage collector happens to run and find it.

Why does it exist in the first place then? It is useful for debugging, but beyond that I think extracting Coq to OCaml would use Obj.magic in cases where Coq can prove that the usage is correct, but that proof cannot be encoded in OCaml’s type system.
In the past OCaml’s type system also used to be more limited (e.g. no GADTs), and it was only in 2021 that Menhir could finally drop its use of Obj.magic.

If you find yourself needing to use Obj.magic then I’d suggest to first search for alternatives, and exhaust every other possibility first. Even then consider having a “no Obj.magic” build of your library to help debug it.
I’ve spent quite a while debugging a library that used unsafe features that caused it to crash, and it wasn’t even Obj.magic, it was an off by one in a call to Array.unsafe_get. But just like with Obj.magic the crash wasn’t at the point of the unsafe call, but at a random point later when the GC ran and found the invalid value and tried to follow it.

You are welcome to post on this forum program fragments that at first glance may only be solvable with Obj.magic and see whether people can help you find a better alternative.

4 Likes

It is worse than that: if you need documentation for the Obj module, you are not familiar enough with the internals of the OCaml compiler to use this module.

More precisely, outside of inspecting the memory representation of OCaml values, using the Obj module (and even worse Obj.magic) is equivalent to asking the compiler to break your code whenever it feels like it. In particular, any user of Obj must be reviewed at each new version of the compiler, because the module offers no forward compatibility guarantees at all.

The few legitimate users of Obj are compilers that use OCaml as an intermediary language like Coq.
Such compilers can afford to break the OCaml type system with Obj.magic because they are backed by a stronger type system.
In others words, the Obj module is not library, it is merely a convenience provided to (other) compilers

TLDR: this is not the library you are looking for.

3 Likes

Thank you all for your prompt replies.

I’m not quite sure what you’re doing with the ^ operator.

I did indeed make a typing error in my last question, I meant to write x * 1 and q * 1.

You are welcome to post on this forum program fragments that at first glance may only be solvable with Obj.magic and see whether people can help you find a better alternative.

As I have not yet mastered Obj I have tried things on my own without understanding them, I apologize in advance for the horrors that will follow and would read with great interest all the articles I have been advised to understand if only for my culture Obj.

I had tried to create a function that would take two arrays as input and return the sum of the index-by-index product of the elements.

Obj may not be necessary for this problem, but having discovered Obj I wanted to try and use it (which results in a really ugly function and certainly not in the spirit of Ocaml).

The two input arrays can be :

tab 1 : float array , tab2 : float array
tab 1 : int array , tab2 : float array
tab 1 : float array, tab2 : int array
tab 1 : int array, tab2 : int array

I had noticed that the tag of the internal representation of objects of types float and int differed between them, for example:

Obj.tag(Obj.repr(5));; Output : int = 1000

Obj.tag(Obj.repr(9.));; Output : int = 253

This allowed me to know the type of the two input arrays by comparing the tag of the internal representation of the first element of each array with the tag of the internal representation of an object of type float and an object of type int.

To perform sum and multiply operations, I wanted to convert one or two of the input arrays to float Array type if they were of int Array type. This would allow me to perform addition and multiplication operations only with float objects.

I therefore
created exceptions that should have allowed me to convert int array tabels into float array tabels by tag comparison.

However, my function doesn’t work if it’s given two int array tabels as input, e.g. :

ouch_my_brain (Obj.magic([|1;2|])) (Obj.magic([|3;4|]));; it will systematically return 0.

whereas if I choose one or two float arrays, for example

ouch_my_brain (Obj.magic([|1;2|])) (Obj.magic([|3.;4.|]));; it will systematically return a kind of pointer as mentioned in my third question.

Careful not to have a stroke, here’s my horrible function:

exception Cas1
exception Cas2
exception Cas3

let sum (tab1: float array) (tab2: float array) : float = 
  let sum = ref 0. in 
  for i = 0 to ((Array.length tab1) - 1) do 
    sum := !sum +. tab1.(i) *. tab2.(i)
  done;
  !sum;;

let ouch_my_brain (tab1: Obj.t) (tab2: Obj.t) : float = 
  let tag1 = Obj.tag (Obj.repr tab1) in 
  let tag2 = Obj.tag (Obj.repr tab2) in 
  let tag_rint = Obj.tag (Obj.repr (5 : int)) in 
  let tag_rflo = Obj.tag (Obj.repr (9.2 : float)) in 

  try 

      if tag1 = tag_rint && tag2 = tag_rint then raise Cas1 else
      if tag1 = tag_rint && tag2 = tag_rflo then raise Cas2 else 
      if tag1 = tag_rflo && tag2 = tag_rint then raise Cas3 else 
      sum (Obj.magic tab1) (Obj.magic tab2) 
  
  with 
    | Cas1 -> sum (Array.map (fun x -> float_of_int x) (Obj.magic tab1)) (Array.map (fun x -> float_of_int x) (Obj.magic tab2))
    | Cas2 -> sum (Array.map (fun x -> float_of_int x) (Obj.magic tab1)) (Obj.magic(tab2))
    | Cas3 -> sum (Obj.magic(tab1)) (Array.map (fun x -> float_of_int x) (Obj.magic tab2))

An array of floats can have a special unboxed representation in OCaml (depending on a configure time flag), so you cannot look at the tag of an array item to know whether it is a float or not, because depending on floating point value stored there it may appear to be either.
You really need to be aware of all these special cases to use Obj, as pointed out previously if you’re not then you’re not ready to use Obj :slight_smile:

I understand the desire to write a generic function that can work on arrays of different types.
You can look at how existing libraries solve that problem (e.g. N-Dimensional Arrays - OCaml Scientific Computing Tutorials), the easiest way is probably to use functors: given a module that can perform operations on array elements you can write a module that performs higher level operations calling these basic functions.

This isn’t the most efficient way of solving it, but it is type-safe and requires very little changes to your sum function:

module type Ops = sig
  type t

  val zero : t

  val one : t

  val add : t -> t -> t

  val sub : t -> t -> t

  val mul : t -> t -> t

  val div : t -> t -> t
end

module IntOps : Ops = Int

module FloatOps : Ops = Float

module MyCode (O : Ops) = struct
  let ( +: ) = O.add

  let ( -: ) = O.sub

  let ( *: ) = O.mul

  let ( /: ) = O.div

  let sum tab1 tab2 =
    let sum = ref O.zero in
    for i = 0 to Array.length tab1 - 1 do
      sum := !sum +: (tab1.(i) *: tab2.(i))
    done ;
    !sum
end

There are other ways, you could for example “hide” the module that performs operations on elements in the type itself (although this might result in less efficient code):
type 'a t = (module Ops with type t = 'a) * 'a array

Since you are experimenting, I encourage you to call ouch_my_brain on two arrays of integers. You should see that your sum function (on floats) is called anyway.

The code you wrote is wrong, and in a nutshell you cannot fix it since there is no way in OCaml to always determine for sure the static type of a value at runtime.
Also, as you can see you cannot write a type in OCaml for the function ouch_my_brain since it would have a type similar to int array|float array -> int array|float array -> float. So you resort to using type Obj.t and then forgo all type safety.

For instance if you call your function on Obj.repr 5 (which you can do and the compiler won’t stop you), your code will segfault.

You have two general approaches to achieve what you want to do in plain OCaml. One is to pack together functions for manipulating your numbers (e.g. in a record or in a module) and pass this pack of functions to your sum function so that the sum can be written in term of these abstract functions over your data type. This is what @edwin has suggested.

Another more pedestrian way is to carry around some information that can be checked at runtime to distinguish the various cases:

type num_array = Int of int array |  Float of float array

let sum tab1 tab2 =
   match tab1, tab2 with
    Int t1, Int t2 ->  ...
|  Int t1, Float t2 -> ...
|  Float t1, Int t2 -> ...
|  Float t1, Float t2 -> ...

Of course there are more involved solutions but they more or less boil down to carrying around some extra information which allows you to recover at runtime the static type of your arguments. Whatever you think is available in Obj does not allow you to do that for arbitrary arrays.

I’d say that the most idiomatic way of writing a function that accomplishes this would be to start by using polymorphism:

let polymorphic_dot_product (mul : 'a -> 'b -> 'c) (add : 'c -> 'c -> 'c) (arr1 : 'a array) (arr2 : 'b array) =
  Seq.map2
    (Array.to_seq arr1)
    (Array.to_seq arr2)
    mul |>
  Seq.fold_left
    add

This function needs to be told how to multiply the elements of arr1 by the elements of arr2 and then how to sum the result. This is a very efficient way (in terms of time and not repeating yourself) to write it, but depending on how you’re using it it might not be ergonomic. I’d say that there are two idiomatic ways to “package” something like this in OCaml.

The first is using variant types: if you’re using int arrays and float arrays interchangeably all throughout your program, then I’d say this makes the most sense:

type numarray = IntArray of int array | FloatArray of float array

let dot_product (narr1 : numarray) (narr2 : numarray) : numarray = match narr1, narr2 with
    | IntArray iarr1, IntArray iarr2 -> IntArray (polymorphic_dot_product Int.mul Int.add iarr1 iarr2)
    | FloatArray farr1, FloatArray farr2 -> FloatArray (polymorphic_dot_product Float.mul Float.add farr1 farr2)
    | IntArray iarr, FloatArray farr | Floatarray farr, IntArray iarr ->
         FloatArray (
             polymorphic_dot_product (fun x n -> Float.mul x (float_of_int n)) Float.add farr iarr
         )

On the other hand, like @edwin said, in some cases you might want to use functors. This makes more sense when you want your code to work with both float arrays and int arrays, but you’re not going to be mixing and matching much. In that case, you could do something like

module MyStuff = struct
    module type NUM = sig
        type t
        val mul : t -> t -> t
        val add : t -> t -> t
    end
    module Make (Num : NUM) = struct
        let dot_product (arr1 : Num.t array) (arr2 : Num.t array) =
            polymorphic_dot_product Num.mul Num.add arr1 arr2
    end
end

Then you can use these like this:

module MyIntStuff = MyStuff.Make(Int)
module MyFloatStuff = MyStuff.Make(Float)

let int_dot_product : int array -> int array -> int = MyIntStuff.dot_product
let float_dot_product : float array -> float array -> float = MyFloatStuff.dot_product

The cool thing about this is that now you can use these with a lot more than Int and Float; any number module with a mul and an add should work.

Thank you all for taking the time to answer my questions and for suggesting programs that will help me better understand and use OCaml. Before I fully use and understand Obj (if I ever need it or just for my personal culture ), I’m going to make sure I have a good grounding in Ocaml, both in theory and practice.