Thoughts about adding operations to opaque types

I’ve been reading this blog post, Parse, don’t validate, and think the ideas are pretty good.

Part of the main idea here is that you should use types that make invalid values impossible. They use an example of a non-empty list. I’ll translate the example into OCaml.

type' a non_empty = NonEmpty of 'a * 'a list [@@unboxed]

let non_empty = function
| [] -> None
| hd :: tl -> Some (NonEmpty (hd, tl))

In this way, you can define an head function that never fails:

let head (NonEmpty (hd, _)) = hd

Only accepting an instances of non_empty ensures the input is never invalid.

“Aha!”, you will surely say, “You are simply pushing error handling to the point at which you create the non_empty type.”

That is correct. The data must still be checked and errors handled. The article claims that it’s better to do this in the data preparation phase than in the processing phase. That is, you parse everything first to ensure only valid values, use types which can only represent the valid values and then compute a result based on these values.

I was considering how one might do this with Map.t and one was simply,

(* just assume this is functorized where `M` is an instance of Map.Make *)
type 'a non_empty = NonEmpty of {l: 'a M.t; v: M.key; d: 'a; r: 'a M.t}

let non_empty = function
  | Empty -> None
  | Node {l; v; d; r; _} -> NonEmpty {l; v; d; r}

I realize there’s probably a better way to do this (using choose and remove, for example), but the more important point is that it’s actually impossible to do this, as far as I know, since there’s no way for user code do destructure Map.t.


At another time I was working on an iterator type inspired by Base.Sequence—I really like these Jane Street libraries, but I always hesitate to include them as dependencies in libraries I design to be distributed because they are kind of large and very opinionated and, while I tend to agree with many of their opinions, I don’t want to force them on potential users of my library, so I have occasionally ended up making poor-man’s copies of individual modules from Base (including Sequence and Sexp) which play nicely with StdLib for use in my libraries.

Anyway, I was working on my iterator type, and I was considering how I might efficiently convert instances of Map.t into my type. To do so, I looked at the implementation of Map.Make.to_seq. ocaml/stdlib/map.ml at 137dd26adc3345547b6eef6da744ac0d66fbc209 · ocaml/ocaml · GitHub

It’s a clever idea. I thought I might apply a similar idea to my iterator—but again remembered that I cannot, as a user, destructure instances of Map.t, nor get access to cool implementation functions like cons_enum so I will probably have to simply convert it to a Seq and then convert that to my type. The overhead is not a big deal, but I’m sure any programmer can understand how bad this feels, especially when the point is to have an iterator which is more efficient than Seq.


So my question is, is there a way to get “super secret access” to some of these implementation functions/constructors? I realize this is totally against the spirit of encapsulation, but I thought I might ask anyway.

1 Like

I don’t know of a way to access those secret implementations (although I think that re-declaring identically and then converting using Obj.magic can get you somewhere?)

An alternative is to re-use the type but make it private (example with List, but same technique for pretty much anything):

module NEList : sig
  type 'a t = private 'a Stdlib.List.t
  val non_empty : 'a Stdlib.List.t -> 'a t option
  (* other functions, mostly identical to stdlib *)
end = struct
  type 'a t = 'a list
  let non_empty = function [] -> None | nelist -> Some nelist

  (* you can re-use a lot of stdlib with an `include` and then some shadowing *)
end
1 Like

I thought about using Obj.magic for the second case (an iterator for maps), and I might end up doing it at some point. The thing I don’t like about it is that if the implementation changes, I start getting segmentation faults, whereas I’d just get type errors at compile time if I could access the original type constructor…

But I realize this is sometimes the price you pay for trying to do things that the language makes illegal.

As far as the second part of your reply, It’s also a good way to do it, but I think the point with creating a new constructor for non-empty lists is that you don’t even have to pattern match against the empty list once you do the conversion. You check for the empty list once when you parse the input, and the compiler knows forever at a structural level that there is always something there. The benefit of your approach is, of course, than you can apply normal list operations to your non-empty list, but I think the point of a non-empty list is normally that you want to do something special with the head, so it should perhaps wash out the same in most cases.

but probably not in all cases.

If you really need to depend on the implementation detail of Map/Set then it might be safer to depend on the behavior of exists which tests the value for the node first, before looking for a witness left and right, and then use the split function from the Map/Set interface:

module StringSet = Set.Make(String)

let top set =
  let r = ref None in
  ignore (StringSet.exists (fun s -> r := Some s; true) set);
  !r

let split set =
  match top set with
    None -> None
  | Some s -> let l,_,r = StringSet.split s set in
    Some (l, s, r)
1 Like

This techniques makes more sense for structures where there is no access to the constructors (e.g., Map). For the cases where you have access to the constructors, I like to do an overlay using the same constructor names and rely on OCaml’s ability to differentiate constructors based on the context.

E.g., for lists:

type 'a nelist =
  | (::) of 'a * 'a list (* re-use constructor, but don't provide the empty constructor at all *)

let iter f (x :: xs) = f x; List.iter f xs
let map f (x :: xs) = f x :: List.map f xs
let auto_fold f (x :: xs) = List.fold_left f x xs

let () = iter print_endline (map string_of_int [1;2;3])

E.g., for seq:

type 'a seq = unit -> 'a node
and 'a node =
  Cons of 'a * 'a Seq.t

let iter f s =
  let (Cons (x, s)) = s () in
  f x ; Seq.iter f s

let of_list = function
  | [] -> None
  | x::xs -> Some (fun () -> Cons (x, List.to_seq xs))

let () = iter print_endline (Option.get (of_list ["aa";"bb";]))
4 Likes

This is really clever!

I’m afraid I’m too stupid to do this! :rofl: Knowing me, I’d get confused when I got a type error instead of non-exhaustive pattern matching error.

But I dunno. It looks really good in the code, so maybe I’ll try it some time and see if I’m really as dumb as I think I am.

I haven’t run into this issue myself, but my first instinct would probably be to fork the code and make the changes myself, for the first problem.

(* definition of a non-empty list *)
type 'a list = End of 'a | Cons of 'a * list

That doesn’t work for your second problem (converting the widely-used Map.t type from the standard library to your own type) but it might be an option worth considering for the first.

1 Like