I’m writing an article together with a friend on what a good function is (in any programming language), and got curious what the OCaml community is thinking. So, what are your thoughts? What is a good function, and how do you keep it good?
Some aspects to consider:
State
Contract (post- and pre-condition)
Pure, referential transparent
Testable, mockable
Documented
Name and how naming relates to the domain
Size
Number of arguments
How to behave in case of failure (assertions, exceptions, different return values)
Changeability
Empirical evidence for recommendations (e.g. correlation between size and fault density)
looked at after the fact, many solutions to difficult problems appear trivial. After being found. That’s why they are solutions and not workarounds. Take E = m c^2. Or excavating troy. Or planetary orbits. Nice thing: trivial solutions are easy to adopt. You don’t need more fancy gear.
However, it’s often not trivial to make functions small.
A good function is one that does what it looks like it’s supposed to be doing. Good code in general is code that can be easily fixed or replaced when that day comes. Things that help toward these goals include:
evocative names.
comments explaining all the context that’s in the programmer’s mind at the time of writing the code. This should explain what the function is intended for.
understandability of the function’s code without jumping to other pieces of code. Using explicit arguments and not relying on external mutable objects helps. Types also help guarantee that function calls are correct without thinking too hard.
accompanying tests. Not only do tests help catch future regressions, but they also illustrate how to call the function.
avoiding unnecessary abstractions.
preferring familiar patterns over unfamiliar ones. Stay consistent with the project’s practices, and more generally with the language’s best practices.
None of this is specific to OCaml. OCaml just makes some of these properties easier to achieve than in some other languages.
Is this possible? Often a function will apply on certain domain elements, and the knowledge about those elements might be in another file. Example: If you want to connect order with invoices and transactions, you might do this in the order module, but the definition of the invoice and transaction elements will be elsewhere.
I think it’s not just about ‘small is better’ but more about the level of abstraction. Each function should deal with a consistent level of abstraction that makes sense for it. E.g. a function that combines data from two different REST API calls should not try to construct and make the HTTP calls directly. It should factor out the actual HTTP calls into helper functions and keep only the logic of combining the data.
This is about anything that’s more general than it needs to be now and in the foreseeable future. The “foreseeable future” is hard to determine, there’s no doubt about this. However, it’s often easier to make things more general or more abstract when the time comes. There’s often no need to make them abstract upfront “just in case”.
In OCaml, some abstractions are possible and even easy to write but can lead to code that’s harder to read. Some examples include:
creating higher-order functions that will be used only once in the application e.g. list_fold_left3 whose type would be ('a -> 'b -> 'c -> 'd -> 'a) -> 'a -> 'b list -> 'c list -> 'd list -> 'a (modeled after List.fold_left and List.fold_left2).
creating generic types that don’t need to be generic e.g. type 'key t = { id: 'key; creation_date: Date.t; description: string } instead of type t = { id: string; creation_date: Date.t; description: string }.
creating parametrized modules (OCaml functors) when that could be avoided.
creating functions with a bad name, functions that are used only once and would benefit from being anonymous.
making a record type abstract (the module interface would expose type t instead of type t = { ... }) and providing a bunch of functions to access its fields, when accessing the fields directly would work just fine.
using jargon to describe data structures that don’t benefit from it. For example, this is the case of calling something a “monad” when this is irrelevant to the user of the library but calling them a “wrapper” would be more insightful.
You’re making it easy for yourself by choosing fields that are very common, like id and date. Let me yank some fields from our database that belongs to the invoice table.
gateway_txn_id
pay_to_accept_quote
entity_id
delivery_small_print
tax_exemption_code
claim_tax_back
But yes, most of the others do make sense. Of course that’s because this domain is so well known by everyone. Other domains could be more obscure.
The “foreseeable future” is hard to determine, there’s no doubt about this
Perhaps you can look at the previous 5 years of the code-base to get a picture of how it will evolve in the future. Just a thought, I’m assuming there is research about this topic already, to track changes and then extrapolate future changes. You could also analyze change requests, and how they change over time.
As a counterpoint, the latter is a lot more easier to understand at a glance, while the former requires at least a closer inspection to understand.
Maybe a good intermediate would be:
let print_invoice_summary (inv : Invoice.t) =
let module I = Invoice_ID in
let module D = Date in
printf "invoice #%s %s: %s\n"
(I.to_string inv.id)
(D.to_string inv.creation_date)
inv.description
which IMO makes it easier to see that the function simply converts its arguments to strings and prints them.
(obviously, for the context of such a simple function this might be overkill, but I’ve found such local module aliases to be of much help in more complex functions where verbose module names can make the program harder to parse through).
Maybe for this particular (toy) example, but in a larger project, well chosen module aliases can help in forming a common language for concisely talking about the domain.
It likely comes down to coding-style preference, this feels closer to the lisp-style development philosophy, where you write simple code to express complex concepts by extending your primitives to be targeted to the domain you’re working in. (in the case of lisp, this is more a necessity, because you don’t have types to guard-rail you, but I still find it nice to adopt in OCaml projects).