Examples of “single case constructor”:
type id = Id of int
type person = Person of {
name: string;
age: int
}
Alternatively, I can use type aliases:
type id = int
type person = {
name: string;
age: int
}
When to use which?
Examples of “single case constructor”:
type id = Id of int
type person = Person of {
name: string;
age: int
}
Alternatively, I can use type aliases:
type id = int
type person = {
name: string;
age: int
}
When to use which?
In both the examples, you are not using the id
type after defining it. Did you mean to use it as a member of the record types?
Note that person
is not typically considered a type alias; since it actually defines a new type (there is no way to use a record type without defininig it previously).
The first one, id
, is indeed a type alias because t
and int
can be used intercheangeably in any context. Thus, your question makes sense for it. Using a one constructor sum type for it helps you avoid accidental confusion between t
and an unrelated int
in your program.
Cheers,
Nicolas
It’s used in the hex
library so that you can keep track of which string is the raw array of bytes and which is the hex-encoded one:
Sometimes you can use a private type alias (instead of a one-constructor sum) for a similar purpose.
The other use for it that I know is when you need to rely on physical equality to distinguish values.
let x = 5
let y = 5
let () = if x == y then print_endline "eq" else print_endline "neq"
type id = Id of int
let a = Id 5
let b = Id 5
let () = if a == b then print_endline "eq" else print_endline "neq"
This prints eq
and then neq
.
The reason is that Id 5
allocates a new block (it’s a constructor so it constructs!!) whereas 5
is just a literal for an immediate value.
I would be surprised if it does what you say outside the toplevel. Sharing immutable constants that are structurally equal is something that even the bytecode compiler does, so I’m expecting eq
and eq
except in bytecode mode with debugging on.
Indeed! I had tested it by calling ocaml
with a file having the content above, but compiling and then running prints eq
and eq
.
You can wrap the construction inside some complicated enough code that the compiler won’t be able to share these constants.
let b = ref true
let f v = if !b then v else assert false
let x = f 5
let y = f 5
let () = if x == y then print_endline "eq" else print_endline "neq"
type id = Id of int
let g v = if !b then Id v else assert false
let a = g 5
let b = g 5
let () = if a == b then print_endline "eq" else print_endline "neq"
The compiler is very clever and can see through your tricks
$ ocamlopt -config-var flambda
true
$ ocamlopt -O3 test.ml -o test && ./test
eq
eq
What about making it blind with:
let g v = Sys.opaque_identity (Id v)
You would likely prefer Id (Sys.opaque_identity v)
.
Your suggestion would prevent the compiler from evaluating a == b
to true
at compile time, but it would still evaluate to true at runtime.
The reason is that Sys.opaque_identity
acts as a barrier between its evaluated argument and the rest of the world. It doesn’t prevent any optimisation from occurring on its own argument. So after inlining you would get:
let a = let v = 5 in Sys.opaque_identity (Id v)
let b = let v = 5 in Sys.opaque_identity (Id v)
Which is equivalent to:
let v_a = 5
let arg_a = Id v_a
let a = Sys.opaque_identity arg_a
let v_b = 5
let arg_b = Id v_b
let a = Sys.opaque_identity arg_b
This gets optimised into:
let shared_const = Id 5
let a = Sys.opaque_identity shared_cont
let b = Sys.opaque_identity shared_cont
So I guess the advice is not
use constructors to make values physically different
but it should be
avoid constructors (or use [@unbox]
) so the physical equality doesn’t depend on compiler optimisations
There are some situations where aliases are not sufficient, and so you must define a new type. The following is impossible (without the -rectypes option):
type 'a infinite_list = 'a * 'a infinite_list
(* Error: The type abbreviation infinite_list is cyclic *)
But this is accepted:
type 'a infinite_list = L of 'a * 'a infinite_list
id
and person
are two examples of “single case constructor” vs type alias.
The person
case is not considered a type alias: records in OCaml are nominative, meaning that two records definitions with the same fields are not compatible:
type t1 = { x : int; y : int }
type t2 = { x : int; y : int }
let v1 : t1 = { x = 0; y = 1 }
let v2 : t2 = { x = 0; y = 1 }
if v1 = v2 (* <- type error *)
then ...
On the other hand, tuples are structural: int * int
is the same type everywhere.
type u1 = int * int
type u2 = int * int
let v1 : u1 = (0, 1)
let v2 : u2 = (0, 1)
if v1 = v2 (* no error *)
then ...
That’s why t1
and t2
(and both definitions of person
in your original examples) are not considered aliases, while u1
, u2
(and the second definition of id
) are considered aliases.
So my answer to your original question is to use single case constructors when you want to create distinguished types, noting that records already perform this so adding an extra constructor only adds syntactic burden in that case.
In non-record cases the extra constructor can have a small runtime cost, that you can get rid of using the [@@unboxed]
attribute if performance is a concern.
Oh I see. Right. So in OCaml ‘traditionally’ the approach has been to use modules. You can see a bigger explanation here: Files, Modules, and Programs - Real World OCaml
Code snippet from there:
module type ID = sig
type t
val of_string : string -> t
val to_string : t -> string
val ( = ) : t -> t -> bool
end
module String_id = struct
type t = string
let of_string x = x
let to_string x = x
let ( = ) = String.equal
end
module Username : ID = String_id
module Hostname : ID = String_id
type session_info = {
user : Username.t;
host : Hostname.t;
when_started : Unix.tm;
}
This allows you to ‘spin up’ as many ‘string ID’ types as you need. You can also create an equivalent ‘int ID’ module:
module Int_id = struct
type t = int
let of_string = int_of_string
let to_string = string_of_int
let ( = ) = Int.equal
end
And of course any other type you want that can support these operations. The advantage is that the actual modules you create to represent the domain IDs (Hostname
, Username
) come with built-in support operations.
This is often known as The newtype pattern where you create a completely new nominal type which is just a wrapper around an existing type.
It’s an amazing FP pattern! But if abused too frivolously it can become an obstacle.
There’s an excellent blog post on the subject that gives you an idea of when to use and when not to use it (it’s in Haskell but the main sentiment applies to OCaml as well).
I would usually prefer an abstract type, but the singleton variants can be useful when pattern-matching GADTs. Consider:
type id = Id of int
type _ value =
| Bool : bool -> bool value
| Int : int -> int value
| Obj : id -> id value
An x : int value
can here be pattern-matched with a single case Int
. If type id = int
, then the Obj
case needs to be matched and rejected, and if the type id
was abstract, this would also be the case when matching an x : bool value
.
You can achieve the same thing without needing to wrap int
in the variant. Since the types on either side of each ->
don’t need to be the same, the following code works the same way:
type id = private Id
type _ value =
| Bool : bool -> bool value
| Int : int -> int value
| Obj : int -> id value
(* or an abstract type instead of int *)
You could also use a polyvar like Obj: int -> [`Id] value
to avoid defining a new type altogether.
Yes, that can be an option, though when the parameter is used to constrain a universal type variable which occurs elsewhere in a signature, one might want it to carry a value.
Good point with the private type in either case, I think that’s as close as we can get to an abstract type.
The newtype pattern has gone mainstream because you can even use it in Python now!
from typing import NewType
Id = NewType('Id', int)
def f(x: Id):
...
f(123) # Static type-checker error
Thank you for recommending the blog post by Alexis King. Names are not type safety and Parse, Don’t Validate are so illuminating!