Share your crazy OCaml code snippet!

In the same idea, I like this line:

It permits to redefine the type t (with an arity) and include the whole implementation to describe an other interface - which supports capabilities. By this way, we don’t need to copy/paste the implementation if we want to improve the interface.

4 Likes

considering your nick, I was sure, you’re a fan of it.

This is especially useful with first class modules as you can do

module rec M : sig
  module type Map = sig
    type key
    (* this is the bad recursive bit *)
    val clone : unit -> (module M.Map with type key = key)
  end
end = M

This was exactly the reason for the Fun.protect design, and it was nice that we managed to agree on the design principles since as I remember it, it was not at all clear a priori that a consensus could be reached.

Good exception handling is surprisingly involved, my talk at the OCaml workshop on Friday will talk about it.

1 Like

This does not look crazy to me. But I thought the OCaml community preferred to tie the resource acquisition to the destructor initialisation using a higher-order function with_file : string -> (in_channel -> 'a) -> 'a instead of open_in. Is there a benefit to defer (i.e. some additional flexibility)?

Something like defer I would consider a more low-level or on-off approach than a with_file function, which captures a specific use case.

One of my first :camel: OCaml code snippets written a year ago is to get 4 random words from a word-list file without reading it in full and thus having constant complexity with growing list length:

open Bigarray

let rec step arr dx look stop idx =
  match idx = stop with
  | true -> stop
  | _ -> (
      match arr.{idx + look} with
      | '\n' -> idx
      | _ -> step arr dx look stop (idx + dx) )

(* walk back until we hit the begin of the line *)
let back arr idx = step arr (-1) (-1) 0 idx

(* walk forward until we hit the end of the line *)
let forw arr len idx = step arr 1 0 len idx

let line arr len idx =
  let off = back arr idx and pos = forw arr len idx in
  pos - off |> Array1.sub arr off |> Bigstring.to_string

let () =
  Random.self_init ();
  let fd = Unix.openfile "words.txt" [ Unix.O_RDONLY ] 0 in
  let mf = Unix.map_file fd char c_layout false [| -1 |] in
  let arr = array1_of_genarray mf in
  let len = Array1.dim arr in
  (fun _ -> len |> Random.int |> Lines.line arr len)
  |> List.init 4 |> String.concat "-" |> print_endline;
  Unix.close fd

See the complete project at mro/xkcd936: ♊️ Mirror of https://code.mro.name/mro/xkcd936 | 🐫 🔐 Generate word combinations like in xkcd.com/936 - xkcd936 - Codeberg.org

1 Like

Some more “Ministry of Silly Code” stuff

let [@inline] fn f x = fun () -> f x
let ( => ) = fn

let ( /> ) str filename =
  let ch = open_out filename in
  Fun.protect ~finally:(close_out => ch)
    (output_string ch => str)

On a more serious note, if more of the Stdlib API will require a thunk as argument (fun () -> .. ) a more
lightweight way of creating them is kind of missing.

Fun.protect is nice example of where such stuff will be handy, as well as some of the Seq and Option stuff.

I like the idea, but I do not like the operator order. output_string ch => str is difficult for me to understand (the example is trivia here, but this will blow my mind out of this context)

I didn’t checked if we have composition issues, but this seem more natural for me reversed:

let [@inline] fn x f = fun () -> f x

1 Like

It will work

let ( /> ) str filename =
  let ch = open_out filename in
    Fun.protect ~finally:(ch => close_out)
      (str => output_string ch)

A prefix operator is also possible. I’m reminded of the lazy operator $ used in Okasaki’s PFDS, so we can define an operator ~$ that is some approximation of that:

# let ( ~$ ) f x () = f x ;;
val ( ~$ ) : ('a -> 'b) -> 'a -> unit -> 'b = <fun>

# let f = ~$ Printf.sprintf "delayed print" ;;
val f : unit -> string = <fun>

# f () ;;
- : string = "delayed print"
4 Likes

I’m not sure whether this already exists somewhere, but I’ve never seen it, so here I go.

In the eternal quest of offsetting as many issues from runtime to compile time, I’ve always encoded the fact a list cannot be empty in its type. Historically as something akin to type 'a list_nonempty = 'a * 'a list, which does the job decently: you get a compile time guarantee that the list has at least a head instead of asserting that List.hd is not None at runtime because you “just know it can’t be empty”. It also enforces clearly in the signature that an empty list cannot be passed. The drawback being that the type differs from a standard list, so client code must be adapted and becomes a bit more convoluted around the first element, but I had accepted that as the cost of such type safety.

More recently I toyed around the idea of using a GADT to enable using the actual list literal syntax and constructor for pattern matching:

module List_nonempty = struct
  type ('a, 'empty) maybe =
    | ( :: ) :
        'a * ('a, [< `Nonempty | `Empty ]) maybe
        -> ('a, [ `Nonempty ]) maybe
    | [] : ('a, [ `Empty ]) maybe

  let hd = function
    | hd :: _ -> hd

  let tl = function
    | _ :: tl -> tl

  let rec to_list = function
    | [ last ] -> Caml.List.[ last ]
    | hd :: next :: tl -> Caml.List.cons hd (to_list (next :: tl))

  type 'a t = ('a, [ `Nonempty ]) maybe
end

This makes using literal syntax possible, List_nonempty.(hd [0; 1; 2]) has type int as expected, List_nonempty.([ 0 ] |> tl |> hd) is a type error as expected, etc. However the type differs entirely from List.t, while with the previous solution at least the tail was a standard list. This made operating on the rest of the nonempty-list a breeze, and to_list was very efficient (just prepend the head to the tail). Here because the :: constructors differ, to_list allocates an entirely new list, which may be unacceptable performance-wise. I had a mixed feeling about it.

Earlier today however, it struck me that since OCaml has become much better at resolving the correct constructs when it has a full type constraint on an expression, maybe I could get a :: b :: _ to resolve to two different :: constructors in a single expression. Actually it should work out of the box … and lo and behold, this completely trivial and much simpler solution entirely does the trick:

module List_nonempty = struct
  type 'a t = ( :: ) : 'a * 'a list -> 'a t

  let map (hd :: tl) ~f = f hd :: List.map ~f tl

  let hd = function
    | hd :: _ -> hd

  let tl = function
    | _ :: tl -> tl

  let to_list = function
    | hd :: tl -> List.cons hd tl
end

This has all the static typing aforementioned properties, while being essentially transparent syntax-wise:

# let l = List_nonempty.[0; 1; 2];;
val l : int List_nonempty.t = List_nonempty.(::) (0, [1; 2])
# List_nonempty.hd l;;
- : int = 0
# List_nonempty.tl l;;
- : int list = [1; 2] (* the tail is a standard list *)
# List_nonempty.(hd (tl l));;
Error: This expression has type int list
       but an expression was expected of type 'a List_nonempty.t

And of_list is trivial and performant.

No more List.hd |> Option.value_exn without significantly complexifying the codebase !

17 Likes

Sometimes, I think a feature-full List_nonempty module would be cool in batteries, or as a standalone library.
It should support most BatList operations (and only have tail rec. code).

1 Like

My chance to plug core_kernel/nonempty_list at master · janestreet/core_kernel · GitHub.

One extra neat trick up its sleeve: it uses the (::) constructor instead of a tuple, so you can write something like:

let x : _ Nonempty_list.t = [ 1; 2; 3 ]
3 Likes

Ah, so it does exist in Core, thanks!

Regarding the :: constructor, that was the final iteration of my post too :slight_smile:

1 Like

Ah, I just realized. Indeed it is!

Binding operators for resource management:

type _ resource =
  | File : (string * Unix.open_flag list * Unix.file_perm) -> Unix.file_descr resource
  | Lock : unit resource
let ( let* ) : type a. a resource -> (a -> 'b) -> 'b = fun v f ->
  match v with
  | Lock ->
    ignore () (* lock *);
    let r = f () in
    ignore () (*release *);
    r
  | File (fname, flags, perms) ->
    let d = Unix.openfile fname flags perms in
    let r = f d in
    let () = Unix.close d in
    r

This is obviously incomplete. The lock needs actual locking mechanism. The file resource handler needs some error management. Still, as a proof of concept, it works:

# let* d = File ("/tmp/foo", Unix.[O_CREAT; O_WRONLY], 0o640) in
  Unix.write_substring d "THIS" 0 4 ;;
- : int = 4

Binding operators for rolling dice (or how to DSL with let*)

You can do a lot in the body of the ( let* ) definition, including interpreting the bound expression in some way.

let ( let* ) v f = f (Random.int v + 1)

Arguably, the code is more readable by including some text:

let ( let* ) (`Roll v) f = f (Random.int v + 1)

Or even with a DSL (in this example using GitHub - raphael-proust/odds: OCaml Dice Dice Something: a library and program to roll dice)

let ( let* ) v f =
  let v = Odds.roll (Odds_parser.t_of_string v) in
  f v
let* r = "2d6 + 1d8" in print_int r

EDIT: obvious use-case: compiling regular expressions let ( let/ ) re f = let re = Str.regexp re in f re

3 Likes

Poor man’s effect handlers:

type 'a t = ..
type 'b handler = {f: 'a . 'a t * ('a -> 'b) -> 'b option}
type 'a ctx = {perform: 'b . 'b t -> ('b -> 'a) -> 'a}

let run_with_handlers (type a) handlers =
  let exception Eff: 'a t * ('a -> a) -> exn in
  let perform (type b) (vl: b t) (cont: b -> a) : a = raise (Eff (vl, cont)) in
  let ctx = {perform} in
  fun (f: (a ctx) -> 'c -> a) (args: 'c) ->
  let rec run (f: unit -> a) =
    try
      f ()
    with Eff (op, cont) ->
      let kont value = run (fun () -> cont value) in
      let value = List.find_map (fun {f} -> f (op, kont)) handlers |> Option.get in
      value in
  run (fun () -> f ctx args)
    

type _ t += ReadInt : int t
type _ t += ReadString : string t

let (let+) x kont = x kont

let helper_comp {perform} kont =
  let+ num1 = perform ReadInt in
  let+ num2 = perform ReadInt in
  kont (num1, num2)

let computation {perform} kont =
  let+ num1 = perform ReadInt in
  let+ num2 = perform ReadInt in
  let+ (num3,num4) = helper_comp {perform} in
  let+ str = perform ReadString in
  kont ("received " ^ Int.to_string (num1 + num2) ^ " and " ^ str ^ " and " ^
        Int.to_string num3 ^ ", " ^ Int.to_string num4)

let saved_handler : (int -> string) ref = ref @@ fun _ -> ""

let my_handler =
  let handle (type a) (vl: a t * (a -> 'b)) : 'b option = match vl with
    | ReadInt, kont -> saved_handler := kont; let x = read_int () in Some (kont x)
    | ReadString, kont -> let x = read_line () in Some (kont x)
    | _ -> None in
  {f=handle}

let vl = run_with_handlers [my_handler] computation (fun x -> x)

let  _ =
  print_string vl;
  print_string (!saved_handler 10);

(edit: now supports multiple effects)
(edit2: simplified the types)
(edit3: added multi-shot handlers)

2 Likes

recently I started an OCaml :camel: implementation accessing djb’s cdb file format:

(* http://cr.yp.to/cdb/cdb.txt
 *
 * Compiler and runtime failure if int <= 31 bit.
 *
 * May get dependencies to Optint and Cstruct.
 *)
module Cdb = struct
  (* hash them all. http://cr.yp.to/cdb/cdb.txt *)
  let hash (byt : bytes) : int =
    let len = byt |> Bytes.length
    and ( << ) = Int.shift_left
    and ( ^ ) = Int.logxor in
    let rec fkt idx h =
      match idx = len with
      | true -> h
      | false ->
          let c = idx |> Bytes.get byt |> Char.code in
          (((h << 5) + h) ^ c) land 0xFFffFFff |> fkt (idx + 1)
    in
    fkt 0 5381

  (*
   * Find data of first record for the key.
   * Leaves the file-pointer behind the hit.
   *
   * http://cr.yp.to/cdb/cdb.txt
   *)
  let first fd (key : bytes) : bytes option =
    (* let uint _ cs off =
       off |> Cstruct.LE.get_uint32 cs |> Int32.unsigned_to_int |> Option.get
    *)
    let uint buf off =
      let ( << ) = Int.shift_left in
      let le idx = off + idx |> Bytes.get buf |> Char.code << idx * 8 in
      le 0 lor le 1 lor le 2 lor le 3 land 0xFFffFFff
    in
    let len = 2 * 4 (* two uint32 *) in
    let buf = len |> Bytes.create (* re-used multiple times *) in
    let ha = key |> hash in
    (* Printf.printf "h = %d\n" ha; *)
    (* Printf.printf "p = %d\n" (ha mod 256); *)
    let _ = Unix.lseek fd (ha mod 256 * len) Unix.SEEK_SET in
    match len = (len |> Unix.read fd buf 0) with
    | false -> None
    | true -> (
        (* let cs = buf |> Cstruct.of_bytes ~len in *)
        match (uint buf 0, uint buf 4) with
        | _, 0 -> None
        | hpos, hcnt -> (
            (* hash table file position and entry count *)
            (* Printf.printf "hpos, hcnt = %d, %d\n" hpos hcnt; *)
            let rec rec_pos slot =
              (* Printf.printf "i = %d\n" i; *)
              match slot < hcnt with
              | false -> 0 (* file inconsistent *)
              | true -> (
                  let _ = Unix.lseek fd (hpos + (slot * len)) Unix.SEEK_SET in
                  match len = (len |> Unix.read fd buf 0) with
                  | false -> 0 (* I/O error *)
                  | true -> (
                      (* let cs' = buf |> Cstruct.of_bytes ~len in *)
                      match (uint buf 0, uint buf 4) with
                      | _, 0 ->
                          (* Printf.printf "slot %d empty\n" s'; *)
                          0
                      | hval, rpos -> (
                          (* Printf.printf "hval, rpos = %d, %d\n" hval rpos; *)
                          match ha = hval with
                          | true -> rpos
                          | false -> rec_pos (slot + 1))))
            in
            match ha / 256 mod hcnt |> rec_pos with
            | 0 -> None
            | rpos ->
                let _ = Unix.lseek fd rpos Unix.SEEK_SET in
                let kl0 = key |> Bytes.length in
                let kbuf = kl0 |> Bytes.create in
                let rec recy i =
                  match len = (len |> Unix.read fd buf 0) with
                  | false -> None (* I/O error *)
                  | true -> (
                      (* let cs' = buf |> Cstruct.of_bytes ~len in *)
                      let klen, dlen = (uint buf 0, uint buf 4) in
                      (* Printf.printf "klen, dlen = %d, %d\n" klen dlen; *)
                      match
                        klen = kl0 && kl0 = (kl0 |> Unix.read fd kbuf 0)
                      with
                      | false -> None (* consistency or I/O error *)
                      | true -> (
                          match key |> Bytes.equal kbuf with
                          | true -> (
                              let dbuf = dlen |> Bytes.create in
                              match dlen = (dlen |> Unix.read fd dbuf 0) with
                              | false -> None (* I/O error *)
                              | true -> Some dbuf)
                          | false -> (
                              match ha = (kbuf |> hash) with
                              | false -> None
                              | true ->
                                  let _ = Unix.lseek fd dlen Unix.SEEK_CUR in
                                  recy (i + 1))))
                in
                recy 0))
end

I didn’t mmap to leverage the file-pointer as state for a possible next()

1 Like

Yet another (ab)?use of the let-binding operators, now for the generalized pattern matching. Suppose we have some hierarchy of types and a value that has the base type, which we want to downcast and perform some operation available in the downcasted types. Usually, we will represent this with pattern matching, but what can we do if the type is not an ADT? Let’s use the let-binding operator let|, like this,

and here is the implementation, it can be generalized, of course to any type, just substitute resort with any non-total function,

3 Likes