Share your crazy OCaml code snippet!

Do you have some particularly beautiful, fun or strange or interesting OCaml
piece of code that you want to share?

If so, just tell us a bit about what this code is doing and why you find it so interesting/fun/strange/beautiful/ugly?

Sharing well indented and syntax highlighted code is preferable.
Also, a snippet should be reasonably short, let’s say a one screen page should be the maximum length.

PS: I might share a fun one, if people are interested.

11 Likes

Ok, so here is one code fragment that I wrote recently.
I like it because it has a lot of structure.
The code is correct and fast enough for what it is doing.
It does some amount of parsing for a chemical file format.

let tokenize_full (s: string): input_smi_token list =
  let res =
    L.map Str.(function
        | Delim "(" -> [[[[Open_paren]]]]
        | Delim ")" -> [[[[Close_paren]]]]
        | Delim _ -> assert(false) (* parens_regexp would be wrong then *)
        | Text a ->
          L.map Str.(function
              | Delim cut_bond_str -> [[[parse_cut_bond cut_bond_str]]]
              | Text b ->
                L.map Str.(function
                    | Delim bracket_atom_str -> [[Bracket_atom bracket_atom_str]]
                    | Text c ->
                      L.map Str.(function
                          | Delim double_digit_rc ->
                            [Ring_closure (parse_double_digit_ring_closure double_digit_rc)]
                          | Text d ->
                            L.map Str.(function
                                | Delim single_digit_rc ->
                                  Ring_closure (parse_single_digit_ring_closure single_digit_rc)
                                | Text e -> Rest e
                              )
                              (Str.bounded_full_split single_digit_regexp d 1024)
                        )
                        (Str.bounded_full_split double_digits_regexp c 1024)
                  )
                  (Str.bounded_full_split bracket_atom_regexp b 1024)
            )
            (Str.bounded_full_split cut_bond_regexp a 1024)
      )
      (* WARNING: bug if more than 1024 tokens in the string ! *)
      (Str.bounded_full_split parens_regexp s 1024) in
  L.flatten (L.flatten (L.flatten (L.flatten res)))

What do you think about this code?
Do you find it ugly (this is your right)? :grinning_face_with_smiling_eyes:
Have you ever seen a piece of code that looks like that?

And, share your own crazy OCaml snippet too, don’t be shy!
Showing some open-source code that you came across, even if not written by you,
is also fine. I am sure there are some gems in the OCaml compiler.

Not mine, this function modify at runtime a function from a module: mdx/mdx_top.ml at 38e1446b127f67bbdd4a64e0b08ac05b77511d5e · realworldocaml/mdx · GitHub

It is used to change the implementation of a function from an other library (Env.without_cmis from compiler-libs).
I can’t believe it has any effect but it’s there.

7 Likes

A pair of my favourite snippets taken from my Twitter feed in the last year or so:

OCaml hard mode

# let ( let- ) x f = f (Obj.magic x) ;;
# let- x = "a" in
  let- y = true in
  x + y ;;
- : int = 47370352459049

Ad-hoc polymorphism for fun and profit

# let a = [| 1 ; 2 ; 3 |] in a.(1) == a.(1) ;;
- : bool = true

# let a = [| 1.; 2.; 3.|] in a.(1) == a.(1) ;;
- : bool = false

Both of these were inspired by code submitted by my students, and I’m fond of them because they led to particularly good teaching moments. The first one demonstrates that the OCaml type system is not just a static safety check bolted on top of a e.g. Python / JavaScript runtime (as my students tend to assume): it offers freedom to the compiler writer too. The second shows a consequence of getting a bit carried away with this freedom (and leads to a nice discussion of languages as APIs).

I’m accumulating quite a few of this kind of snippet, which may make for a fun blog post at some point :slight_smile: (if I can ever bring myself to give them up as supervision questions…)

13 Likes

In the same vein as @CraigFe, some experiment with binding operators…

match parse_as_branch s with
| Some r -> Ok r
| None -> (
    match parse_as_pr s with
    | Some r -> Ok r
    | None -> (
        match parse_as_directory s with
        | Some r -> Ok r
        | None -> Error `Unknown))

:arrow_down:

let ( let/ ) f k = match f s with Some r -> Ok r | None -> k () in
let/ () = parse_as_branch in
let/ () = parse_as_pr in
let/ () = parse_as_directory in
Error `Unknown
9 Likes

This is a nice reminder that OCaml still lacks an if let construct.

if let Some r = parse_as_branch s then Ok r
else if let Some r = parse_as_pr s then Ok r
else if let Some r = parse_as_directory s then Ok r
else Error `Unknown

There is a function Option.value_or_else inspired by Rust being proposed here but it is not exactly what you need. But with a function (also in Rust):

Option.or_else : 'a option -> (unit -> 'a option) -> 'a option

you could write something like:

Option.or_else (parse_as_branch s) @@ fun () ->
Option.or_else (parse_as_pr s) @@ fun () ->
parse_as_directory s 

(And guys, your uses of custom let binders are scary.)

8 Likes

I would propose for @emillon 's example:

Option.to_result ~none:`Unknown (List.find_map (fun f -> f s)
  [parse_as_branch; parse_as_pr; parse_as_directory])
2 Likes

yes, definitely! :sweat_smile:
Why don’t you use |>?

It becomes this:

let tokenize_full (s: string): input_smi_token list =
  let res =
    (* WARNING: bug if more than 1024 tokens in the string ! *)
    Str.bounded_full_split parens_regexp s 1024
    |> L.map Str.(function
      | Delim "(" -> [[[[Open_paren]]]]
      | Delim ")" -> [[[[Close_paren]]]]
      | Delim _ -> assert(false) (* parens_regexp would be wrong then *)
      | Text a ->
          Str.bounded_full_split cut_bond_regexp a 1024
          |> L.map Str.(function
            | Delim cut_bond_str -> [[[parse_cut_bond cut_bond_str]]]
            | Text b ->
                Str.bounded_full_split bracket_atom_regexp b 1024
                |> L.map Str.(function
                  | Delim bracket_atom_str -> [[Bracket_atom bracket_atom_str]]
                  | Text c ->
                      Str.bounded_full_split double_digits_regexp c 1024
                      |> L.map Str.(function
                        | Delim double_digit_rc ->
                            [Ring_closure (parse_double_digit_ring_closure double_digit_rc)]
                        | Text d ->
                            Str.bounded_full_split single_digit_regexp d 1024
                            |> L.map Str.(function
                              | Delim single_digit_rc ->
                                  Ring_closure (parse_single_digit_ring_closure single_digit_rc)
                              | Text e -> Rest e
                            )
                      )
                )
          )
    )
  in
  L.flatten (L.flatten (L.flatten (L.flatten res)))
2 Likes

A primary source of some “crazy OCaml code snippets” is the OCaml testsuite/tests directory itself:

2 Likes
(* surprising places where you can use an operator as a variable name *)
function (+) -> (+);;
function _ as (+) -> (+);;
for (+) = 0 to 1 do () done;;

from exotic_syntax.ml

1 Like

(Stolen a long time ago from a now dead wiki.)

The infixing operator pair:

# let ( /* ) x f = f x ;;
val ( /* ) : 'a -> ('a -> 'b) -> 'b = <fun>
# let ( */ ) f x = f x ;;
val ( */ ) : ('a -> 'b) -> 'a -> 'b = <fun>
# 1L /* Int64.add */ 2L ;;
- : int64 = 3L

You can replace /*-*/ with a different set (although you have to be careful about precedence and associativity) but this choice really is half the fun.

15 Likes

I believe @CraigFe shared a snippet with << and >> playing the same role not so long ago.

I suppose actually using them would incur run-time costs? I am not quite clear about inlining optimizations in the compiler.

1 Like

we were talking about this on ocaml discord back in march and i went for something quite similar to that, i think the reason i chose %> and not >> was because the nicer looking <<...>> pair has operators of equal precedence, so it wouldn’t work.

let (<%) = (|>)
let (%>) = Fun.flip

let _ = assert (3 <%Int.sub%> 1 = (3 |> Fun.flip Int.sub 1))

it’s only because i used Fun.flip though.

3 Likes

This isn’t crazy per se, but I had to stop and go “mmm hmmm!” a few times when I came across it. It occupies a sweet spot between simple, elegant, useful and sophisticated, IMO.

So, Core_kernel.Memo lets you wrap code with memoizers.

Example usage:

(* ensure hostname doesn't change in between lookups *)
let gethostname = Memo.unit (fun () -> Unix.gethostname ())

(* gethostname () = "my-linux-laptop" *)

(* avoid an expensive lookup of the same uid twice *)
let get_username_for_uid = Memo.general (fun user_id ->
  match Unix.Passwd.getbyuid user_id with
  | None -> None
  | Some pwd -> Some pwd.Unix.Passwd.name)

(* get_username_for_uid 1000 = Some "mbacarella" *)

It’s like Lazy, though a bit nicer in some ways. Right-o, so that part is nice but not that crazy.

The interface is fairly simple, and has some knobs you can tweak:

type ('a, 'b) fn = 'a -> 'b

val unit : (unit -> 'a) -> (unit, 'a) fn

val general
  :  ?hashable:'a Hashtbl.Hashable.t
  -> ?cache_size_bound:int
  -> ('a -> 'b)
  -> ('a, 'b) fn

But the implementation is surprisingly conceptually dense?

(* ... snip ... *)

let unbounded (type a) ?(hashable = Hashtbl.Hashable.poly) f =
  let cache =
    let module A =
      Hashable.Make_plain_and_derive_hash_fold_t (struct
        type t = a

        let { Hashtbl.Hashable.hash; compare; sexp_of_t } = hashable
      end)
    in
    A.Table.create () ~size:0
  in
  (* Allocate this closure at the call to [unbounded], not at each call to the memoized
     function. *)
  let really_call_f arg = Result.capture f arg in
  fun arg -> Result.return (Hashtbl.findi_or_add cache arg ~default:really_call_f)
;;

(* the same but with a bound on cache size *)
let lru (type a) ?(hashable = Hashtbl.Hashable.poly) ~max_cache_size f =
  if max_cache_size <= 0
  then failwithf "Memo.lru: max_cache_size of %i <= 0" max_cache_size ();
  let module Cache =
    Hash_queue.Make (struct
      type t = a

      let { Hashtbl.Hashable.hash; compare; sexp_of_t } = hashable
    end)
  in
  let cache = Cache.create () in
  fun arg ->
    Result.return
      (match Cache.lookup_and_move_to_back cache arg with
       | Some result -> result
       | None ->
         let result = Result.capture f arg in
         Cache.enqueue_back_exn cache arg result;
         (* eject least recently used cache entry *)
         if Cache.length cache > max_cache_size
         then ignore (Cache.dequeue_front_exn cache : _ Result.t);
         result)
;;

let general ?hashable ?cache_size_bound f =
  match cache_size_bound with
  | None -> unbounded ?hashable f
  | Some n -> lru ?hashable ~max_cache_size:n f
;;

We’ve got (1) a functor instantiation (2) as a first class module, (3) a locally abstract type (the (type a)), that (4) unusual looking top-level destructuring of hashable, (5) a not-seen-every day Hash_queue module to implement the LRU; in the end (6) a fairly simple interface elementary OCaml hackers can use, and (7) it provides generally useful functionality.

Nice.

2 Likes

I was just going through a JSON value checking some specific properties (e.g., this field exists, the value of that field looks like so, etc.). For that purpose I wasn’t really interested in the difference between “this field doesn’t exist” and “this field holds something other than what I expect” (although the difference may have some interest when reporting the error).

And so I ended up with something like the following:

let (let*) o f = try Option.bind o f with Match_failure _ -> None in
let ( @ ) kvs k = List.assoc_opt k kvs in
begin
  let* (`O specs) = top @ "specs" in
  let* (`O layout) = specs @ "layout" in
  let* (`String "reference") = layout @ "kind" in
  let* (`String name) = layout @ "name" in
  let true = valid_name name in
  Some ()
end [@@warning "-8"] (* partial matches are intended *)

I don’t think I’m going to push this code mostly because it fits a bit too well into this thread.

2 Likes

From LexiFi’s codebase (NB: don’t use this code! it depends on low-level details of the implementation of objects by the compiler and may blow up horribly without any warning :smiley:)

let memoize_obj (o : < .. >) =
  let o = Obj.repr o in
  let meths = Obj.dup (Obj.field o 0) in
  Obj.set_field o 0 meths;
  let nmeths : int = Obj.magic (Obj.field meths 0) in
  for i = 0 to nmeths - 1 do
    let idx = i * 2 + 2 in
    let old_f : Obj.t -> Obj.t = Obj.magic (Obj.field meths idx) in
    let new_f self =
      assert(self == o);
      match old_f o with
      | r -> Obj.set_field meths idx (Obj.repr (fun _ -> assert(self == o); r)); r
      | exception e -> Obj.set_field meths idx (Obj.repr (fun _ -> assert(self == o); raise e)); raise e
    in
    Obj.set_field meths idx (Obj.repr new_f);
  done

(** By inheriting from this class, an object gets "memoized method" semantics.
    All its (public) methods become memoized.
    It is forbidden to clone such an object. *)
class memoized = object(this)
  initializer memoize_obj this
end

Cheers,
Nicolas

1 Like

I like this small bit of code that defines defer inspired by Go:

let defer f = Fun.protect ~finally:f

Now you can immediately deal with deallocating a resource at the place where you allocate it and never think about it again. The block following @@ fun () -> is now protected and either on return or exception will execute the code attached by defer.

let read_file max_size path =
  let ic = open_in path in
  defer (fun () -> close_in ic) @@ fun () ->
  let size = in_channel_length ic in
  if size > max_size then failwith "input file exceeds maxium size"
  else really_input_string ic size
4 Likes

You should rather use close_in_noerr here, a finally raising function is considered a programming error.

1 Like

a finally raising function is considered a programming error.

That’s weird. It already defines its own exception type; I wonder why it doesn’t just define it as exception Finally_raised of exn * exn option and keep both exceptions?

(or, even better, support chained exceptions as in e.g. PEP 3134 – Exception Chaining and Embedded Tracebacks | peps.python.org)

1 Like

These things were discussed in this PR.

I’m not sure whether there’s a really good answer to the problem but I’d say that treating a raising finally as a programming error is maybe the best discipline one can have as far as reasoning about your code goes.

Otherwise you keep on pushing the problem away, protect the protect… and I’m a bit dubious whether any good and evident error handling strategy can or should be be built on a nesting of Finally_raised exceptions.