Types for constraining any number of arguments for a function

Hello,

Motivated by Raven, we’re thinking of having a declarative schema of a table (like csv file), and be able to define functions row-wise over them. See Declarative Pipelines by Databrics for a production use-case.

I devised the following solution

(* abstract schema *)
type _ schema =
  | []  : unit schema
  | (::) : 'a * 'b schema -> ('a * 'b) schema

(* table as a list conforming to some schema *)
type 'a table = ('a schema) list

(* greet users following the schema *)
let greet_users (greet : 'a schema -> string) (tbl : 'a table) : unit = 
  List.map greet tbl
  |> List.iter (fun str -> Printf.printf "%s\n" str)

(* example *)
let () =
  let my_tbl : 'a table = [
    [ 1; "Alice"; Some "alice@ocaml.com" ];
    [ 2; "Bob"; None ]
  ]
  and my_greet : 'a schema -> string = function
    | [ _; name; Some email ] ->
        Printf.sprintf "Hello %s, your email is %s" name email
    | [ _; _; None ] -> "email not found"
  in
  greet_users my_greet my_tbl

Edit. I am thankful for @JohnJ for fixing the example to use the schema properly.
Note ‘a table and ‘a schema in the example are inferred.

Pros.

  • type _ schema may represent any number of columns of any type.
  • greet_users ensures the input of greet function is consistent with the schema of the table.
  • function my_greet may be defined over any number of arguments of any type — technically it is a single argument encoding multiple arguments recursively.

Discussion.

  • What do you think of this solution, in regards to expressivity and performance?
  • Could we design a nicer sugar syntax, so that the table’s records are like [1, “Alice”, “alice@ocaml.com”], and the function is like my_greet id name (Some email) = ..?
  • Could meta-programming be useful?

This given example doesn’t actually make use of the schema type, because it only ever uses a single :: constructor for each schema. You may have meant to write the example like this:

let () =
  let my_tbl : 'a table = [
    [ 1; "Alice"; Some "alice@ocaml.com" ];
    [ 2; "Bob"; None ]
  ]
  and my_greet : 'a schema -> string = function
    | [ _; name; Some email ] ->
        Printf.sprintf "Hello %s, your email is %s" name email
    | [ _; _; None ] -> "email not found"
  in
  greet_users my_greet my_tbl

Which I think also answers your question about whether we can design a nicer syntax.

You are totally right.

For the record, if anyone is learning, they may try

  let my_tbl : 'a table = [
    1 :: ("Alice" :: (Some "alice@ocaml.com" :: []));
    2 :: "Bob" :: None :: []
  ]

What about the performance? Are there any drawbacks?

Note. I updated my post with your solution.

A simple example using arrays

type table = {
  names  : string array;
  emails : string option array;
}

let greet_users (tbl : table) =
  Array.iteri (fun i name ->
    (* O(1) access to the ith row *)
    match tbl.emails.(i) with
    | Some email ->
        Printf.printf "Hello %s, your email is %s\n" name email
    | None ->
        Printf.printf "Hello %s, email not found\n" name
  ) tbl.names

let () =
  let tbl = {
    names  = [| "Alice"; "Bob" |];
    emails = [| Some "alice@ocaml.com"; None |];
  } in
  greet_users tbl