Confusion over understanding sample code I found

I am beginning my journey with ocaml, and am working on a little cli tool to get my feet wet. I found this code that is adjacent to what Im trying to do, but I’m finding I dont understand it well enough to adapt it to what Im trying to do.

(** [dir_contents] returns the paths of all regular files that are
 * contained in [dir]. Each file is a path starting with [dir].
  *)
let dir_contents dir =
  let rec loop result = function
    | f::fs when Sys.is_directory f ->
          Sys.readdir f
          |> Array.to_list
          |> List.map (Filename.concat f)
          |> List.append fs
          |> loop result
    | f::fs -> loop (f::result) fs
    | []    -> result
  in
    loop [] [dir]

I have a solid grasp of whats happening from the Sys.is_directory bit all the way down till it calls loop again. I think Im mostly confused on the syntax used for

let rec loop result = function
    | f::fs ......

It feels similar to a match statement, but I dont know how to make sense of this syntax with the keyword function. What are the different | sections? Are they control flow branching? I don’t seem to know what to google here unfortunately.

Likely closely related, how do I make sense of the second argument being passed to loop (loop [dir] vs loop result)? Where is that second argument going?

function is like fun x -> match x with but without needing to come up with a name for x.
This explains why loop takes 2 arguments.

The | are the match cases.

@SkySkimmer’s explanation is completely correct. Personally, I’d write the inner function with an explicit match, so that it’s easier to read:

let dir_contents dir =
  let rec loop result worklist =
    match worklist with
    | f::fs when Sys.is_directory f -> ...
    | f::fs -> loop (f::result) fs
    | []    -> result
  in
    loop [] [dir]

However, the original code has O(n^2) time complexity (where n is the length of the final result) because of the List.append fs. Here is a linear-time implementation:

let dir_contents dir =
  let rec add_contents accu filename =
    if Sys.is_directory filename then
      Sys.readdir filename
      |> Array.map (Filename.concat filename)
      |> Array.fold_left add_contents accu
    else
      filename :: accu
  in add_contents [] dir
2 Likes

This is super helpful. The explicit match is much easier for me to read. Also nice to compare with how you would write it. Thanks!

Digression.

The extra benefit of the explicit match is that it makes it easier to annotate the return type of the function, should that turn out to help in debugging etc. OTOH I sometimes like the terseness of function, and its “fundamentality” in that you can define both fun and match in terms of it, but you need both fun and match to define function. SML takes a different approach with a more “mathematical” / natural way of defining a function by cases. While objectively it might be a more readable solution, subjectively I don’t like SML’s syntax. SML syntax downsides are that it makes the definition boundary less graspable, and definition-site vs. call-site less distinct.

1 Like