Do you have some particularly beautiful, fun or strange or interesting OCaml
piece of code that you want to share?
If so, just tell us a bit about what this code is doing and why you find it so interesting/fun/strange/beautiful/ugly?
Sharing well indented and syntax highlighted code is preferable.
Also, a snippet should be reasonably short, let’s say a one screen page should be the maximum length.
PS: I might share a fun one, if people are interested.
Ok, so here is one code fragment that I wrote recently.
I like it because it has a lot of structure.
The code is correct and fast enough for what it is doing.
It does some amount of parsing for a chemical file format.
let tokenize_full (s: string): input_smi_token list =
let res =
L.map Str.(function
| Delim "(" -> [[[[Open_paren]]]]
| Delim ")" -> [[[[Close_paren]]]]
| Delim _ -> assert(false) (* parens_regexp would be wrong then *)
| Text a ->
L.map Str.(function
| Delim cut_bond_str -> [[[parse_cut_bond cut_bond_str]]]
| Text b ->
L.map Str.(function
| Delim bracket_atom_str -> [[Bracket_atom bracket_atom_str]]
| Text c ->
L.map Str.(function
| Delim double_digit_rc ->
[Ring_closure (parse_double_digit_ring_closure double_digit_rc)]
| Text d ->
L.map Str.(function
| Delim single_digit_rc ->
Ring_closure (parse_single_digit_ring_closure single_digit_rc)
| Text e -> Rest e
)
(Str.bounded_full_split single_digit_regexp d 1024)
)
(Str.bounded_full_split double_digits_regexp c 1024)
)
(Str.bounded_full_split bracket_atom_regexp b 1024)
)
(Str.bounded_full_split cut_bond_regexp a 1024)
)
(* WARNING: bug if more than 1024 tokens in the string ! *)
(Str.bounded_full_split parens_regexp s 1024) in
L.flatten (L.flatten (L.flatten (L.flatten res)))
What do you think about this code?
Do you find it ugly (this is your right)?
Have you ever seen a piece of code that looks like that?
And, share your own crazy OCaml snippet too, don’t be shy!
Showing some open-source code that you came across, even if not written by you,
is also fine. I am sure there are some gems in the OCaml compiler.
It is used to change the implementation of a function from an other library (Env.without_cmis from compiler-libs).
I can’t believe it has any effect but it’s there.
A pair of my favourite snippets taken from my Twitter feed in the last year or so:
OCaml hard mode
# let ( let- ) x f = f (Obj.magic x) ;;
# let- x = "a" in
let- y = true in
x + y ;;
- : int = 47370352459049
Ad-hoc polymorphism for fun and profit
# let a = [| 1 ; 2 ; 3 |] in a.(1) == a.(1) ;;
- : bool = true
# let a = [| 1.; 2.; 3.|] in a.(1) == a.(1) ;;
- : bool = false
Both of these were inspired by code submitted by my students, and I’m fond of them because they led to particularly good teaching moments. The first one demonstrates that the OCaml type system is not just a static safety check bolted on top of a e.g. Python / JavaScript runtime (as my students tend to assume): it offers freedom to the compiler writer too. The second shows a consequence of getting a bit carried away with this freedom (and leads to a nice discussion of languages as APIs).
I’m accumulating quite a few of this kind of snippet, which may make for a fun blog post at some point (if I can ever bring myself to give them up as supervision questions…)
In the same vein as @CraigFe, some experiment with binding operators…
match parse_as_branch s with
| Some r -> Ok r
| None -> (
match parse_as_pr s with
| Some r -> Ok r
| None -> (
match parse_as_directory s with
| Some r -> Ok r
| None -> Error `Unknown))
let ( let/ ) f k = match f s with Some r -> Ok r | None -> k () in
let/ () = parse_as_branch in
let/ () = parse_as_pr in
let/ () = parse_as_directory in
Error `Unknown
This is a nice reminder that OCaml still lacks an if let construct.
if let Some r = parse_as_branch s then Ok r
else if let Some r = parse_as_pr s then Ok r
else if let Some r = parse_as_directory s then Ok r
else Error `Unknown
There is a function Option.value_or_else inspired by Rust being proposed here but it is not exactly what you need. But with a function (also in Rust):
Option.or_else : 'a option -> (unit -> 'a option) -> 'a option
you could write something like:
Option.or_else (parse_as_branch s) @@ fun () ->
Option.or_else (parse_as_pr s) @@ fun () ->
parse_as_directory s
(And guys, your uses of custom let binders are scary.)
let tokenize_full (s: string): input_smi_token list =
let res =
(* WARNING: bug if more than 1024 tokens in the string ! *)
Str.bounded_full_split parens_regexp s 1024
|> L.map Str.(function
| Delim "(" -> [[[[Open_paren]]]]
| Delim ")" -> [[[[Close_paren]]]]
| Delim _ -> assert(false) (* parens_regexp would be wrong then *)
| Text a ->
Str.bounded_full_split cut_bond_regexp a 1024
|> L.map Str.(function
| Delim cut_bond_str -> [[[parse_cut_bond cut_bond_str]]]
| Text b ->
Str.bounded_full_split bracket_atom_regexp b 1024
|> L.map Str.(function
| Delim bracket_atom_str -> [[Bracket_atom bracket_atom_str]]
| Text c ->
Str.bounded_full_split double_digits_regexp c 1024
|> L.map Str.(function
| Delim double_digit_rc ->
[Ring_closure (parse_double_digit_ring_closure double_digit_rc)]
| Text d ->
Str.bounded_full_split single_digit_regexp d 1024
|> L.map Str.(function
| Delim single_digit_rc ->
Ring_closure (parse_single_digit_ring_closure single_digit_rc)
| Text e -> Rest e
)
)
)
)
)
in
L.flatten (L.flatten (L.flatten (L.flatten res)))
(* surprising places where you can use an operator as a variable name *)
function (+) -> (+);;
function _ as (+) -> (+);;
for (+) = 0 to 1 do () done;;
# let ( /* ) x f = f x ;;
val ( /* ) : 'a -> ('a -> 'b) -> 'b = <fun>
# let ( */ ) f x = f x ;;
val ( */ ) : ('a -> 'b) -> 'a -> 'b = <fun>
# 1L /* Int64.add */ 2L ;;
- : int64 = 3L
You can replace /*-*/ with a different set (although you have to be careful about precedence and associativity) but this choice really is half the fun.
we were talking about this on ocaml discord back in march and i went for something quite similar to that, i think the reason i chose %> and not >> was because the nicer looking <<...>> pair has operators of equal precedence, so it wouldn’t work.
let (<%) = (|>)
let (%>) = Fun.flip
let _ = assert (3 <%Int.sub%> 1 = (3 |> Fun.flip Int.sub 1))
This isn’t crazy per se, but I had to stop and go “mmm hmmm!” a few times when I came across it. It occupies a sweet spot between simple, elegant, useful and sophisticated, IMO.
So, Core_kernel.Memo lets you wrap code with memoizers.
Example usage:
(* ensure hostname doesn't change in between lookups *)
let gethostname = Memo.unit (fun () -> Unix.gethostname ())
(* gethostname () = "my-linux-laptop" *)
(* avoid an expensive lookup of the same uid twice *)
let get_username_for_uid = Memo.general (fun user_id ->
match Unix.Passwd.getbyuid user_id with
| None -> None
| Some pwd -> Some pwd.Unix.Passwd.name)
(* get_username_for_uid 1000 = Some "mbacarella" *)
It’s like Lazy, though a bit nicer in some ways. Right-o, so that part is nice but not that crazy.
The interface is fairly simple, and has some knobs you can tweak:
type ('a, 'b) fn = 'a -> 'b
val unit : (unit -> 'a) -> (unit, 'a) fn
val general
: ?hashable:'a Hashtbl.Hashable.t
-> ?cache_size_bound:int
-> ('a -> 'b)
-> ('a, 'b) fn
But the implementation is surprisingly conceptually dense?
(* ... snip ... *)
let unbounded (type a) ?(hashable = Hashtbl.Hashable.poly) f =
let cache =
let module A =
Hashable.Make_plain_and_derive_hash_fold_t (struct
type t = a
let { Hashtbl.Hashable.hash; compare; sexp_of_t } = hashable
end)
in
A.Table.create () ~size:0
in
(* Allocate this closure at the call to [unbounded], not at each call to the memoized
function. *)
let really_call_f arg = Result.capture f arg in
fun arg -> Result.return (Hashtbl.findi_or_add cache arg ~default:really_call_f)
;;
(* the same but with a bound on cache size *)
let lru (type a) ?(hashable = Hashtbl.Hashable.poly) ~max_cache_size f =
if max_cache_size <= 0
then failwithf "Memo.lru: max_cache_size of %i <= 0" max_cache_size ();
let module Cache =
Hash_queue.Make (struct
type t = a
let { Hashtbl.Hashable.hash; compare; sexp_of_t } = hashable
end)
in
let cache = Cache.create () in
fun arg ->
Result.return
(match Cache.lookup_and_move_to_back cache arg with
| Some result -> result
| None ->
let result = Result.capture f arg in
Cache.enqueue_back_exn cache arg result;
(* eject least recently used cache entry *)
if Cache.length cache > max_cache_size
then ignore (Cache.dequeue_front_exn cache : _ Result.t);
result)
;;
let general ?hashable ?cache_size_bound f =
match cache_size_bound with
| None -> unbounded ?hashable f
| Some n -> lru ?hashable ~max_cache_size:n f
;;
We’ve got (1) a functor instantiation (2) as a first class module, (3) a locally abstract type (the (type a)), that (4) unusual looking top-level destructuring of hashable, (5) a not-seen-every day Hash_queue module to implement the LRU; in the end (6) a fairly simple interface elementary OCaml hackers can use, and (7) it provides generally useful functionality.
I was just going through a JSON value checking some specific properties (e.g., this field exists, the value of that field looks like so, etc.). For that purpose I wasn’t really interested in the difference between “this field doesn’t exist” and “this field holds something other than what I expect” (although the difference may have some interest when reporting the error).
And so I ended up with something like the following:
let (let*) o f = try Option.bind o f with Match_failure _ -> None in
let ( @ ) kvs k = List.assoc_opt k kvs in
begin
let* (`O specs) = top @ "specs" in
let* (`O layout) = specs @ "layout" in
let* (`String "reference") = layout @ "kind" in
let* (`String name) = layout @ "name" in
let true = valid_name name in
Some ()
end [@@warning "-8"] (* partial matches are intended *)
I don’t think I’m going to push this code mostly because it fits a bit too well into this thread.
From LexiFi’s codebase (NB: don’t use this code! it depends on low-level details of the implementation of objects by the compiler and may blow up horribly without any warning )
let memoize_obj (o : < .. >) =
let o = Obj.repr o in
let meths = Obj.dup (Obj.field o 0) in
Obj.set_field o 0 meths;
let nmeths : int = Obj.magic (Obj.field meths 0) in
for i = 0 to nmeths - 1 do
let idx = i * 2 + 2 in
let old_f : Obj.t -> Obj.t = Obj.magic (Obj.field meths idx) in
let new_f self =
assert(self == o);
match old_f o with
| r -> Obj.set_field meths idx (Obj.repr (fun _ -> assert(self == o); r)); r
| exception e -> Obj.set_field meths idx (Obj.repr (fun _ -> assert(self == o); raise e)); raise e
in
Obj.set_field meths idx (Obj.repr new_f);
done
(** By inheriting from this class, an object gets "memoized method" semantics.
All its (public) methods become memoized.
It is forbidden to clone such an object. *)
class memoized = object(this)
initializer memoize_obj this
end
I like this small bit of code that defines defer inspired by Go:
let defer f = Fun.protect ~finally:f
Now you can immediately deal with deallocating a resource at the place where you allocate it and never think about it again. The block following @@ fun () -> is now protected and either on return or exception will execute the code attached by defer.
let read_file max_size path =
let ic = open_in path in
defer (fun () -> close_in ic) @@ fun () ->
let size = in_channel_length ic in
if size > max_size then failwith "input file exceeds maxium size"
else really_input_string ic size
a finally raising function is considered a programming error.
That’s weird. It already defines its own exception type; I wonder why it doesn’t just define it as exception Finally_raised of exn * exn option and keep both exceptions?
I’m not sure whether there’s a really good answer to the problem but I’d say that treating a raising finally as a programming error is maybe the best discipline one can have as far as reasoning about your code goes.
Otherwise you keep on pushing the problem away, protect the protect… and I’m a bit dubious whether any good and evident error handling strategy can or should be be built on a nesting of Finally_raised exceptions.