Lists and patterns / performance (Real world OCaml)

In Lists and Patterns - Real World OCaml, paragraph “Performance”, it is written:

Naively, you might think that it would be necessary to check each case in a match in sequence to figure out which one fires.

Actually, “naively”, I would think that the first case matching the parameter is executed. Isn’t that the case?

Next sentence reads:

If the cases of a match were guarded by arbitrary code, that would be the case.

What does it mean to be “guarded by” some code ?

The alternate implementation using “else if” also looks like the first matching case will be executed, hence not all checks need to be done, so the difference in performance is not clearly explained. Where does it really come from ?

Finally, in the second benchmark (sum versus sum_if), it seems that sum_if has to check three times if the list is empty (one each for List.is_empty, List.hd_exn, List.tl_exn) versus one time, so why is the performance ratio closer to 3.5 than 3 ?


In the code

match e with PAT1 -> e1 | PAT2 -> e2 | PAT3 when guard1 -> e3 | PAT4 when guard2 -> e4

PAT3, PAT4 are “guarded” by “when clauses”. The chapter in the book doesn’t give an example, but that’s what I’m guess they mean.


What the author was trying to point out is that to find the first case matching the parameter, you may not have to individually check each case until you found the correct one. The compiler may have chosen a compilation strategy that allows it to find the correct branch more quickly. Typically, for patterns on integers, you could consider a dichotomic search or a constant table lookup.
But of course, in the end the case that will be executed should always be the first one that matches.

In terms of source language, it means that the when keyword is used. Concretely, if you use a when clause (which is often called a guard), the associated condition must be checked at runtime if the pattern part of the case matches.

let plus_one_when x =
  match x with
  | n when n = 0 -> 1
  | n when n = 1 -> 2
  | n when n = 2 -> 3
  | n when n = 3 -> 4
  | n when n = 4 -> 5
  | n when n = 5 -> 6
  | n -> n + 1;;

This function will be compiled the same way as the plus_one_if function.

The exact value is not really meaningful. For one thing, calling each of the List functions has a cost in addition to the pattern-matching that is performed. That alone could account for most of the performance difference; after all, pattern matching is fast, so doing the same pattern matching three times only matters if the rest of the code is very simple.
I assume that the numbers in the book were obtained in bytecode; if you ran the same benchmark using the native compiler, the function calls would likely be inlined and you would get a much closer result. And of course, if your compiler was really clever, it might actually simplify the sum_if function to the same code as the sum function. This is not the case with the current compilers, so you should definitively prefer the version with pattern-matching, but we’ve been thinking about this kind of optimisations for a while.


Thanks @vlaviron, that was very clear ! The kind of compiler that you mention at the end should indeed be very clever !

Meta, but:

Naively, you might think that…

Nowadays I’ve really started to dislike this kind of language in educational materials. ‘Naively, you might think…’, or ‘The clever reader will notice that…’. I think educational materials should deliver knowledge with simple language, not distract readers with judgments about their naivete or cleverness.


I think the authors are trying to encourage you writing code such as

match xs with
| [] -> …
| x :: [] -> …
| x :: y :: [] -> …
| x :: y :: zs -> …

and trying to discourage you from attempting to “optimise the code by hand” with patterns such as

match xs with
| [] -> …
| x :: rest -> match rest with
    | [] -> …
    | y :: rest -> match rest with
        | [] -> …
        | zs -> …

If you don’t trust the compiler enough, you might think that the second form has better performance because you might think that you avoid re-traversing the list to distinguish later patterns. More specifically, you might think that the first form behaves like the following example:

match xs with
| [] -> …
| xs -> (* no luck here, let's try the next pattern *)
    match xs with
    | [] -> assert false (* we've been through this already!! *)
    | x :: [] -> …
    | xs -> (* no luck, let's try the next pattern *)
        match xs with
        | [] | x :: [] -> assert false (* well these checks sure are a waste of time *)
        | x :: y :: [] -> …
        | xs -> (* no luck still! *)

whereas in fact it behaves like the “hand-optimised” version.


Thanks @raphael-proust ! By the way: is there a difference (in terms of performance, or good practices, or more common conventions) between writing x :: [] , x :: y :: [] and [x] , [x;y] (or the intermediate x :: [y]) ?

As far as I know there are no differences at all. As far as I know, the distinction disappears after the parsing.

I chose the ::-[] style in my post to make it more obvious what the relation between the different patterns were. But you should use whatever is more readable for you.

Fun fact, there is a third form you can use for those patterns: the prefix form of :: which (just like infix operators) is obtained via parentheses:

let is_empty = function | [] -> true | ( :: ) (_, _) -> false ;;

This form tends to be more verbose, but it can be useful if you are generating the code programmatically. Or if you are overloading the :: constructor (e.g., crowbar/crowbar.mli at master · stedolan/crowbar · GitHub)

1 Like