What functions deserve tests?

In my opinion, tests are not needed for function string_of_day:

type day = Monday | Tuesday
let string_of_day (day: day) : string =
  match day with
  | Monday -> "Monday"
  | Tuesday -> "Tuesday"

In contrast, I feel obliged to write tests for a sort function.

What’s worse, I’m not sure if I should write tests for function add1_all:

let add1_all (nums: int list) : int list =
  List.map (fun n -> n + 1) nums

So what functions deserve tests?

My idea: Tests are used to make sure a function implementation matches its specification (aka “program correctness”). A function’s implementation may closely resemble its specification (eg, string_of_day), or may be very different from its specification (eg, quicksort). Little resemblance means high need to write tests, and vice versa.

Additionally, what do you think of mocking? In Mocking is a Code Smell, the author suggests that the need to mock is a sign of bad code structure. And he wrote:

If there is no logic in your code (just pipes and pure compositions), 0% unit test coverage might be acceptable, assuming your integration or functional test coverage is close to 100%.


Not a very good tester but I tend to write tests when there is a bug, either while developing, either after someone reported one. This means something I thought was evident wasn’t. Also edges cases are good candidates (e.g. empty strings/lists) lots of bugs lurk there.

Then rather write specifications in OCaml itself and automate the test generation. Quickcheck and fuzzing frameworks do that for you (opam search quickcheck).


What if the implementation has a typo and the output string is Mondy?

I think if the tests are really easy to write, the energy required to argue against writing them becomes greater than the energy needed to just write the tests.

let string_of_day (day: day) : string =
  match day with
  | Monday -> "Monday"
  | Tuesday -> "Tuesday"

Writing tests for string_of_day is essentially writing the function once again, which doesn’t feel right.


A test is a program that helps check whether another program is “correct” based on some known criteria or specification. Imo, a test should both provide this check and convince a reader that it checks what it claims to. If tests are redundant in the ways you point out, or are too complicated, then I think they fail to perform these functions.

So, I tend to avoid writing unit tests if the code for the test is not simpler than the implementation of the program i am testing.

I would not write a test for add_1_all. In fact, I would not write that function and just use List.map ((+) 1) directly. IMO, best to not add names for functions that are so obvious to read off the composition of combinators that they are built from. I would probably write a (ideally property based) test for the functions that used that tho :slight_smile:

That said, if a colleague insisted I write a test that I thought wasn’t worth it, I’d probably do that instead of argue about it.


Seems like a good standard!


What’s the impact of a failure? If it’s high, write a test. Writing automated tests is a risk mitigation technique. It can also be a good way to document the correct, expected behaviour of a unit or system.

1 Like

Then you write a test :–) I mean I wouldn’t object to someone writing a test for that upfront, I just know that for myself I wouldn’t.

To add two more things to my initial message.

If you are writing libraries handling standards a good way to test is to devise a cli tool that provides a service for the standard and then dog food yourself with tool (and/or expect test it).

Also since this is out of personal failure if you assess the correctness of some of your functions in the toplevel then just don’t. When I wrote xmlm 16 years ago I made a lot of tests to make sure the various options to handle the whitespace were working correctly. All these tests have now vanished which is very stupid.


How do you assess the impact of a failure ?

We perfectly know by now that in a system, a small “innocuous” bug in a component can lead to catastrophic failure of the whole system.

1 Like

Perhaps one should look at this stackexchange answer

I wouldn’t write one either. But if you (and you always should) add a day_of_string, you can do tests against identity (modulo options or errors).


This is of course highly subjective but I’ll give my 2c.

I like to think of what is the cost/benefit of testing or not testing a particular piece of code.

If I’m unsure how a function will behave, testing is a net benefit because the alternative is manually testing when the code changes and this will take longer over the long term.

Inversely, I could spend time writing and maintaining tests for something which will not actually break in practice or that is not actually important. In that case it’s a net cost and I should really use my time better.

One thing that I feel TDD conversations usually sweep under the rug is the subject of design. If your design is poor, then I feel no amount of testing will actually help. Too many inputs produces too many outputs, so one mustn’t be too focused on the tests themselves at this point. Both are not mutually exclusive of course.

Another viewpoint which I find convincing is to consider testing being similar to double entry accounting : you specify things twice such that if you made a mistake on either side, something is bound to bubble up.

1 Like

Risk and assumption analysis. :slight_smile: Also compare with spiral model.

What does “tests against identity” mean?

Oh, sorry.

What I mean is composing two functions which are (“morally”) inverse and yield the identity function.

So, string_of_date . date_of_string = id (ignoring invalid strings, which should be tested too).
and date_of_string . string_of_date = id.

string_of_date (date_of_string s) = s
date_of string (string_of_date d) = d

These are tests which are perfectly fit for property testing (using QCheck/Quickcheck).


Good luck with that :–)

You can’t assess risks and assumptions in a modular context since you don’t know how the component you are working on is going to be integrated.

So basically if you want to be safe the answer to this question:

Is always: high. There are no little bugs, even a display bug on a dashboard may lead a human operator to take catastrophic decisions.

That conclusion is only useful if money and time is infinite. But yes, risk is domain specific, of course. Risk also includes probability, not just impact.

Some (possibly incorrect) thoughts:

  1. typically systems/libraries are inadequately tested. So if you’re down to this function (string_of_day), then you’re doing so much better than even most of the best devs, that you ought to feel pretty good. So there might be lower-hanging targets for you to add test coverage for ? Just a thought.

  2. Mocks: I’ve most developed systems – and typically distributed systems with significant internal state. In that context, it’s important to use mocks in order to be able to induce failure-modes for subsystem B, so that we can verify that subsystem A properly handles it.

There are other examples where mocks are really useful: verifying that in a particular error-mode, a system emits a particular log-message (it might not be able to crash, but during recovery it should still emit particular alerts).

The list probably goes on and on. I think the author of that post kind of knows this: “Mocking is great for integration tests”.


That is exactly the message I use with my students reluctant to write tests. I would add that the two way of writing the same thing do not have the same lifecycle which is crucial:

  1. Often, a function may evolve while the test last. The ROI coming from regression checking is sometimes not forseen or forgotten (this relates with Daniel history of xmlm above).
  2. If the author of the two objects (the function code and its tests) are different, it is better.

My opinion is that testing needs to be based first and foremost on your interface. Whatever your public interface is, that needs to be tested first. The next candidates are high-complexity functions and functions that manipulate a lot of state. These need to be tested thoroughly to make sure you’re exercising every execution path and edge case. Otherwise you have no idea if your code works.

At the extreme end, since every part of your code can be refactored into a ‘function’, you’d then be obligated to test every single one of these functions. This clearly doesn’t make sense and it would make refactoring near impossible. So there needs to be a balance.

At some ideal level, your test code would only cover your interface and that would be ‘good enough’, allowing you to refactor without needing a test rewrite. This is too idealistic though, as there are many parts of code, particularly stateful code, that deal with a massive state space that isn’t readily replicated via the interface. But I think this is something to strive for.

1 Like