Types as first class citizens in OCaml

Is it possible to use types in OCaml as first class citizens?

For example, I want to implement relational data model in OCaml and create a database as a library solution. By definition relation contains header (it looks like it should be a record with attribute names as keys and types of attributes as values) and body (it looks like a set(list) of tuples or records, values of which must correspond to types in headers). So, for headers I need ability to write something like this:

let header = {dep_id = int; name = string; ...}
Is it possible to extend OCaml with such dynamics(?) but without huge performance downgrade or how do you handle such cases?

4 Likes

Maybe the thing you’re looking for is GADT’s

(* Given:
module Db : sig
  type t
  module Row : sig
    type t
    val get_int : t -> int -> int
    val get_string : t -> int -> string
  end

  val query : t -> string -> Row.t list
end
*)

type _ column =
  | Dep_id of int column
  | Name of string column

let to_column_name : type a. a column -> string = function
  | Dep_id -> "dep_id"
  | Name -> "name"

(* `read_all db Dep_id` returns an `int list` of all values in dep_id, and
   `read_all db Name` returns a `string` list of all values in `name` *)
let read_all db : type a. a column -> a list =
  let query = Printf.sprintf "SELECT %s FROM some_table"
                (to_column_name column) in
  Db.query db query
  |> List.map ~f:(fun row ->
    match column with
    | Dep_id -> Db.Row.get_int row 0
    | Name -> Db.Row.get_name row 0

Beyond that, maybe you’d need to look into PPX.

4 Likes

Yes, you can do this, and various solutions are already provided by different libraries, e.g., Janestreet’s Univ module from the Core Kernel library. Usually, it is called Universal values, but in fact, these are dynamically typed values, implemented with GADT+Extensible Variants+First class modules.

Basically, the representation of a value is a packed existential:

type univ = Pack : 'a witness * 'a -> univ

where the witness is an extensible GADT type

type 'a witness = ..

Runtime types are represented as first class modules, of the following type:

module type Witness = sig 
     type t 
     type _ witness += Id : t witness
end

type 'a typeid = (module Witness with type t = 'a)

New types are created on fly:

let newtype (type u) () = 
   let module Witness = struct 
          type t = u
          type _ witness += Id : t witness
      end in
   (module Witness : Witness with type t = u)

For example,

let int : int typeid = newtype ()

Having the runtime type we can provide the pack function:

let pack (type u) (module Witness : Witness with type t = u) x = 
  Pack (Witness.Id,x)

E.g.,

let x : univ = pack int 42

and the unpack function:

let unpack (type u) 
   (module Witness : Witness with type t = u) (Pack (id,x)) : u option = 
   match id with
   | Witness.Id -> Some x
   | _ -> None

E.g.,

 unpack int x;;
- : int option = Some 42

So this gives you fully working dynamic typing in OCaml. You can also provide dictionaries and other specialized data structures. You may also provide adhoc polymorphism by requiring/allowing registering implementations for types, i.e., you may ask a user to provide the t -> string function, and then provide a generic show : univ -> string function.

The main drawback of this solution, is serialization. The default marshal module won’t work, as it will loose (correctly) the type witness. A possible solution is to use something else (e.g. UUID) for type witnessing when a global witness is required (e.g., distwit library).

34 Likes

Concerning the performance. I think it is more or less obvious from the implementation. First of all, all data values are boxed, i.e., your integers will now use 3 words (header + key + value) instead of one. For data types that already use the boxed representation, the overhead is only one word (the key).

Concerning packing and unpacking. The latter one is one comparison instruction, plus allocation of the option value (you can also provide function that throws exception, so that you can unpack without an extra allocation). And the packing is just an allocation of the 2-tuple.

To summarize, it is fast, especially, if you are not trying to do heavy number crunching on numbers represented as universals.

4 Likes

A completely different approach, that doesn’t involve GADT, but may have a higher performance cost is implemented in the OGRE library, the trick is simple - you can use some common representation for your universals (e.g., a string, and database key, a sql statement, etc) and then your runtime data type representation is just the parsing (extracting) function. The library also shows how one can represent headers as a composition of parsers. See the implementation for more details.

2 Likes

This may not directly answer the original question, but here is another nice trick that makes types more of a first class citizen. It allows you to choose type representations at runtime:

module type S = sig
  type t
  val of_string : string -> t
end

module Make (Arg : sig val kind : [ `string | `int ] end) () = struct
  include (val (
    match Arg.kind with
    | `string -> (module struct
        type t = string
        let of_string str = str
      end : S)
    | `int -> (module struct
        type t = int
        let of_string = int_of_string
      end : S)
  ))
end

module M = Make (struct let kind = `int end)

Note that the () is mandatory in the functor definition, because it needs to be generative, i.e. create a fresh type on each invocation, even with identical arguments.

I found this particularly useful for problems where you need to optimize the type representation of the solution depending on parameters obtained at runtime. Note that if Flambda can determine at compile time what these parameters are, it can specialize your code and achieve excellent performance for your particular use case.

17 Likes

The kind of thing you want is called type representation. It’s the inverse of dependent-types, 'cause you’re downgrading types into value-level rather than upgrading values into type-level (the case for dependent types - with OCaml we can achieve a light version using either functors and/or singleton types).

Take a look into this project:

It may help you a lot with your database example.

7 Likes

Is there a way to serialize Univ-map via marshalling somehow? It seems other modules/types in core_kernel are bin_prot compatible but not Univ_map. :frowning:

Nope, Univ_map is not persistable between runs. It is non-trivial but possible to implement a universal map that provides persistence.

It is 2024 now, and I wonder if using this library from Core should be considered a viable and recommended approach in new code?

I also wonder, if there’s an already existing solution, preferably in the viccinity of JaneStreet’s ecosystem, whence it would be possible to convert select class attributes into property lists?

Curiously, it seems that there’s no longer a Univ library in the Core. What happened? Is there a different library recommended at this point?

The biggest question with the usability of such libraries is how would they handle the opaque module types. Could Depyt help with this?

The biggest practical question is how the cases corresponding to each possible desired representation could be extended. So far, it seems it may be possible to build a huge single tree type encompassing all types in a project, and use a portion of matching cases in a program through the GADT. Such solution is less than satisfactory, since ideally additional cases should be possible to add through definitions in separate modules, rather than having all possible cases centralized at the point where the main type that is responsible for tagging data is defined.

1 Like