[Real world Ocaml] Put t first

Hello, I just finished reading the early module chapter of “Real world Ocaml” and wondered about the rationale behind one of the module design guidelines shared towards the end of the chapter (guideline used apparently in Base and Core):

Put t first. If you have a module M whose primary type is M.t, the functions in M that take a value of type M.t should take it as their first argument.

I actually saw the opposite point being made in another language community (see “The data structure is always the last argument”). Does anyone have concrete reasons (or practical/ergonomics preferences) why is it desirable to have the value of type M.t as first argument?

1 Like

Just for context. In OCaml:

  • type constructors take arguments on the left: int list
  • functions take arguments on the right: print_endline “hello”

Functions are usually written to take several arguments (curried) rather than a tuple holding all arguments. Argument order matters when we want to to supply fewer than all arguments. The OCaml library is not consistent in this case. Observe Hashtbl.add vs. Map.add. The difference between Hashtbl and Map is that the former is a stateful abstraction and the latter is purely functional. It makes sense to provide t as the first argument when the abstraction is stateful - you are not going to change it in subsequent calls. But it makes more sense to supply it last in a purely functional case.

See also Why isn't t-first the default convention?.

The convention you quote is really specific to packages from the Jane Street ecosystem like the Base library. The opposite convention is actually more common otherwise (for persistent structures only though, as @lindig noted), for the same reasons given in your link, that it works better for function composition / piping: you can write things like:

[1; 2; 3] |> List.to_seq |> Seq.filter is_even |> Seq.iter print_int

or:

List.map (List.map my_func) my_list_of_lists

To be able to still do this in the t-first world, Base puts labels on some or all of the other arguments:

[1; 2; 3] |> List.to_seq |> Seq.filter ~f:is_even |> Seq.iter ~f:print_int
List.map my_list_of_lists ~f:(List.map ~f:my_func)

but many people do not like to have non-meaningful labels everywhere.

4 Likes

Can’t they be meaningful labels?

1 Like

On Seq.iter ~f:print_int I just can’t see what a meaningful label would be, any label is just adding reading noise.

If I understand correctly, you don’t find the ~f: meaningful?

Since Seq.iter takes two arguments, I personally find the ~f: label useful to disambiguate which one of the two is the callback. Perhaps this is because I’m too stubborn to memorize the order of function parameters and don’t believe there is always a “natural” or canonical ordering.

I guess List.fold is a better example. I am very grateful I can just write List.fold ~init:0 ~f:(+) without having to sweat about parameter ordering…

2 Likes

If you need a label to disambiguate between these two arguments then you have a larger problem than that: you are not having meaningful names for your parameters.

(I’m not very sympathetic to your other arguments because they are concerned about writing which I never find interesting to optimize for)

thanks @jjb for the link, I’m sorry I basically posted a duplicated question :sweat_smile:

The ~f is likely for “function”. It does seem to be meaningful.

And here, it can be used to choose the order of parameters. List.map l ~f:(a long function) or A list computation |> List.map ~f:(a function).

5 Likes

It’s really a question of whether you’re willing to pay the label tax on every function, or if you prefer to reserve labels for situations where confusion could legitimately happen between arguments or important information needs to be carried in the argument name. Whichever convention you choose (first or last, labels everywhere or not), it pays to be consistent within your codebase.

2 Likes

No no, I didn’t mean to make you feel sorry! The intent is to show that it is a legit question that does come up, and there are different positions. I just didn’t have time to expand on the link.

1 Like

Having t first was preferred to each module choosing its own order because if you use t-first consistently in a codebase, that’s fewer idiosyncracies and friction when using modules you aren’t very familiar with. In base, you say Map.find t key and Hashtbl.find t key, whereas in the stdlib, the argument order differs. Same reason that base prefers M.create to a mix of M.make and M.create, for instance.

t-last consistently would have had the same effect presumably, but I’m guessing this was considered less natural. But although this wasn’t relevant when the t-first convention was chosen, now t-first is the convention that best enables type based disambiguation (that’s the reason why Base.List.map takes t first, and isn’t anymore an alias for ListLabels.List.map, which takes t last).

AFAIK, the main (single?) argument for t-last is partial application using currying. I’d rather have the extra type based disambiguation from t-first, and use ppx_partial to create effectively partial applications without currying.

I think it bears repeating that, as a motivation for labeling arguments that is orthogonal to readability, if you label a higher-order function argument, thereby enabling reordering it at call sites, then you can reorder it in the arguments of the signature to give stronger type-based disambiguation. For example, if a fold is given the type:

val fold : 'val t -> 'acc -> f:(key -> 'val -> 'acc -> 'acc) -> 'acc

then the arguments of f benefit from the types of the 'val t and accumulator passed at the call site (no matter their order relative to ~f), and you can still partially-apply fold to obtain an “accumulator transformer” of type 'acc -> 'acc.

2 Likes