Pattern matching on private strings

Hi,

I was wondering why pattern matching on private strings with literal strings is disallowed while, if the string goes through a constructor, there is no problem. Here is an illustration of what I have in mind:

module A : sig
  type t = private string
end = struct
  type t = string
end

module B : sig
  type t = private S of string
end = struct
  type t = S of string
end

module C : sig
  type t = private S of string [@@unboxed]
end = struct
  type t = S of string [@@unboxed]
end

let f : A.t -> bool = function
  | "a" -> true
  | _ -> false

let g : B.t -> bool = function
  | B.S "a" -> true
  | _ -> false

let h : C.t -> bool = function
  | C.S "a" -> true
  | _ -> false

g and h can be defined without problem while f generated the error:

Error: This pattern matches values of type string
       but a pattern was expected which matches values of type A.t
1 Like

I’m guessing there might be difficulties related to type inference. The type of the pattern B.S _ is unambiguous while the type of "a" could be string or a private string.

I don’t know the reason (and indeed this is strange) but you can define f by coercing an A.t to a string:

let f (x:A.t) = match (x :> string) with
  | "a" -> true
  | _ -> false;;
val f : A.t -> bool = <fun>

Per the manual, private has somewhat different behavior in variant and record declarations, type abbreviations, and row types: https://caml.inria.fr/pub/docs/manual-ocaml/extn.html#sec236

For variants:

Values of a variant or record type declared private can be de-structured normally in pattern-matching or via the expr.field notation for record accesses.

On the other hand, for a type abbreviation you have to coerce the expression.

That being said, I don’t know if there’s a principled reason why you can’t have the variant-declaration behavior w/r/t pattern matching when the type abbreviation is for a variant type. After all, you can always just coerce it and then match, right?

let f (x:A.t) = match (x :> string) with
| “a” -> true
| _ -> false;;

Thanks but, in the case I’m targeting, A.t isn’t naked but itself an argument of another constructor (say X of A.t) on which I want to pattern match (| X "a" -> ...).

It’s really neat that variant (or record) constructors can be disambiguated with a type annotation. It’s also interesting that other types with ambiguous constructors like private strings are not disambiguated in the same way.

module Module : sig
  type private_string = private string
  type original_variant = X
end = struct
  type private_string = string
  type original_variant = X
end

type shadowing_variant = X

(* "a" is inferred as a string *)
let f1 = function "a" -> () | _ -> ()

(* error *)
let f2 = function ("a" : Module.private_string) -> () | _ -> ()

(* warning (which can be disabled) *)
let f3 = function (X : Module.original_variant) -> ()

Perhaps this example is clearer:

module Module : sig
  type private_string = private string
end = struct
  type private_string = string
end

(* Maybe this could put the private string constructor in scope? *)
open Module

type original_variant = X
type shadowing_variant = X

(* "a" is inferred as a string *)
let f1 = function "a" -> () | _ -> ()

(* error explaining that "a" is a string, not a private_string *)
let f2 = function ("a" : private_string) -> () | _ -> ()

(* ok *)
let f3 = function (X : original_variant) -> ()

Thanks but, in the case I’m targeting, A.t isn’t naked but itself an argument of another constructor

You can use guards as a somewhat ugly work around for that case:

module A : sig
  type t = private string
end = struct
  type t = int
end

type t = X of A.t

let f = function
  | X s when (s :> string) = "a" -> true
  | _ -> false

If you are willing to parameterise the type containing the constructor in question, you can also coerce before matching:

type 'a t = X of 'a

let f (x : A.t t) =
  match (x :> string t) with
  | X "a" -> true
  | _ -> false

This isn’t a very nice solution because it involves adding one type parameter for each private type, of which there might be many. Might be enough to get you past this particular problem, though.

there is a bitstring matching library for OCaml (bitstring in opam). I wonder if it could be extended to handle regular strings.

In this case you have to pattern-match on the coerced value of your A.t behind your constructor. Example:

module A : sig type t = private string end = struct type t = string end

type t = A of int | B of A.t

let f = function
  | A i -> i
  | B s -> match (s :> string) with
    | "a" -> 1
    | "b" -> 2
    | _ -> 3

f (B (Obj.magic "a"));;
- : int = 1

f (B (Obj.magic "b"));;
- : int = 2

f (B (Obj.magic "a string"));;
- : int = 3
1 Like

I believe a general feature that would solve this problem and more is F#'s active patterns. If I understand correctly, we’d write something like this:

let (|String|) (s : private_string) = (s :> string)

let f (s : private_string) =
  match s with
  | String "a" -> ...

That does not work when you have more “complex” pattern matching such as

| Node(loc, B "a") -> do_something
| Node(_, B "b") -> do_something_else

In this case you can use a guard as suggested by @gsg:

| Node (loc, B s) when (s :> string) = "a" -> do_something
| Node (_, B s) when (s :> string) = "b" -> do_something_else

A private type abbreviation is considered as a strict subtype of the type it abbreviates, hence you have to explicitly coerce your value to pattern-match against them (otherwise the type checker will complain). It may be possible to change the compiler in such a way that the coercion will be done implicitly, but for the moment you don’t have the choice.

That’s what I started to do but it becomes fairly quickly not so nice to read IMHO. I prefer to go through a constructor.

Some of the answers here are quite confused. This difference comes from the private keyword being used for three different but similar things:

  1. It can define a variant or record type whose constructors are only allowed in pattern matching:
    type t = private T of int
    
    which will allow you to match on a t with T but not construct one.
  2. It can define a polymorphic variant or object type with abstract row/presence variables.
    type t = private [< `A of int | `B of int > `B]
    
    which will unify with [< `A of int | `B of int] allowing you to pattern match with `A or `B, and also with [> `B of int] allowing you to construct with `B, but not with [> `A of int] so you cannot construct with `` A
  3. It can define an arbitrary subtype of another type:
    type t = private int
    
    which gives a type that can be coerced to int with the :> operator.

The difficulty you are having is that OCaml does not support a coercion operator in patterns. Really you should be able to write something like:

type t = private int
type s = A of t
let f = function
  | A (4 : int <: t) -> ...

I believe @trefis has a patch that adds such an operator, but it has never been polished and submitted upstream.

Wouldn’t it be simple to authorize matching a literal with one of its subtype? For example, with these types:

module A : sig type t = private string end = struct type t = string end

type t = Empty | Node int * A.t

if I want to pattern-match against a t value in a “complex” pattern, I have to write:

let f = function
  | Empty -> 1
  | Node (i, a) -> match i, (a :> string) with
    | 1, "a" -> 1
    | 2, "b" -> 2
    | 3, _ -> 3
    | _ , "c" -> 4
    | _ -> 5;;

with your proposition, this is more verbose (I have to repeat coercion in each case):

let f = function
  | Empty -> 1
  | Node (1, ("a" : string <: t)) -> 1
  | Node (2, ("b" : string <: t)) -> 2
  | Node (3, _ )-> 3
  | Node (_ , ("c" : string <: t)) -> 4
  | _ -> 5

It will be more convenient if we could simply write:

let f = function
  | Empty -> 1
  | Node (1, "a") -> 1
  | Node (2, "b") -> 2
  | Node (3, _ )-> 3
  | Node (_ , "c") -> 4
  | _ -> 5

as with @Chris00 trick, but without defining a private type with a single unboxed variant.

That would be implicit subtyping. The inference for implicit subtyping is tricky (to put it mildly) in general, and OCaml does not support it at this time.

I don’t think we need implicit subtyping in all expressions, or even inference on subtyping (as in MLsub for example), but only when we pattern-match against a private type abbreviation (as in this topic).

Compare these two examples:

module M : sig
  type i = I of int 
  type t = private i
end = struct
  type i = I of int
  type t = i
end

module N : sig type t = private I of int end = struct type t = I of int end

In each case we define a type isomorphic to int (namely I of int) and one of its subtypes. But since, in the second case we do not “export” the I constructor we can directly pattern-match against a t value, and in the first one we have to coerce first.

let f (x : M.t) = match (x :> M.i) with M.I i -> i;;
val f : M.t -> int = <fun>

let g = function N.I i -> i;;
val g : N.t -> int = <fun>

and if we forget to coerce first, we have a type error:

let f (x : M.t) = match x with M.I i -> i;;
Error: This pattern matches values of type M.i
       but a pattern was expected which matches values of type M.t

Here, the compiler may implement the algorithm we use manually: it expects a pattern which matches a value of type M.t, but since M.t is a private type abbreviation it cannot be pattern matched directly, therefore it should be coerced first to its supertype M.i.

but since M.t is a private type abbreviation it cannot be pattern matched directly

Sure it can:

match (x : M.t) with
| y -> ...

So that leads to the question of how do you decide between

let foo (_ : M.t) = ()
let bar (_ : M.i) = ()

match (x : M.t) with
| y -> foo y

and

let foo (_ : M.t) = ()
let bar (_ : M.i) = ()

match (x : M.t) with
| y -> bar y

and that is before we are even inferring the type of the scrutinee. Consider:

let foo (_ : M.t) = ()

let f x =
  foo x;
  match x with
  | I y -> y + 1
  | _ -> 0

versus:

let foo (_ : M.t) = ()

let f x =
  match x with
  | I y -> foo x; y + 1
  | _ -> 0

Whatever way you look at it this is implicit subtyping. That doesn’t mean it’s impossible, but it does mean that it needs to be properly thought through and handled with care.

This only works for a toplevel match, doesn’t it? But if you have a record, a field of which is of type M.t, you couldn’t write a nested pattern properly, could you?

edit: misread, I thought it was an upcast before the matching. So even in the toplevel case you can only bind, not deconstruct, which is a strange discrepancy between private aliases and private definitions (type t = private { … }) which allow destructuring.