On the problem of testing a module’s internals, which is only currently possible with inline tests, RWOC makes an argument here under “Where Should Tests Go?” , quoted below, that internal test-only libraries should almost always be used instead. I agree with that and thought it would be good to post here. I’m trying to avoid using any inline tests.
Putting tests directly in the library you’re building certainly has some benefits. For one thing, it lets you put a test for a given function directly after the definition of that function, which in some cases can be good for readability. This approach also lets you test aspects of your code that aren’t exposed by its external interface.
While this sounds appealing at first glance, putting tests in libraries has several downsides.
Readability. Including all of your tests directly in your application code can make that code itself harder to read. This can lead to people writing too few tests in an effort to keep their application code uncluttered.
Bloat. When your tests are written as a part of your library, it means that every user of your library has to link in that testing code in their production application. Even though that code won’t be run, it still adds to the size of the executable. It can also require dependencies on libraries that you don’t need in production, which can reduce the portability of your code.
Testing mindset. Writing tests on the inside of your libraries lets you write tests against any part of your implementation, rather than just the exposed API. This freedom is useful, but can also put you in the wrong testing mindset. Testing that’s phrased in terms of the public API often does a better job of testing what’s fundamental about your code, and will better survive refactoring of the implementation. Also, the discipline of keeping tests outside of requires you to write code that can be tested that way, which pushes towards better designs.
For all of these reasons, our recommendation is to put the bulk of your tests in test-only libraries created for that purpose. There are some legitimate reasons to want to put some test directly in your production library, e.g., when you need access to some functionality to do the test that’s important but is really awkward to expose. But such cases are very much the exception.
Thanks for all your contributions, guys. I’ll use this info to update the ocamlverse page.
I decided to go with ppx_expect and ppx_inline_test, but to use both in external test modules the same way Jane Street does. ppx_expect has the advantage that debuggers have over compiling cycles with printfs: you can look at the entire state at once without having to reason about specific conditions, and that allows you to be both exploratory and more comprehensive in your testing. I think ppx_expect and its brethren are generally superior to plain old unit testing. Of course, this assumes you have the printers for everything. In my case, I use ppx_yojson everywhere, so I can easily print the state of just about everything in my program. Why do I still need ppx_inline_tests? There are some things that I don’t have printers for. For example, some functions return a slew of polymorphic variants signifying different results, and I don’t want to limit those variants and place them in a specific type, so I can’t derive their printer with ppx_yojson. For those specific cases I still use the more limited approach of writing specific condition tests, and it feels far more painful.
The one annoying thing I encountered was that neither ppx_expect nor ppx_inline_tests work for executables. I had to split my executable up into a library part that contains all the stuff, and an executable wrapper that simply calls main () on the library. Not a huge deal, but this could be improved.
Note that for executables, dune also exposes cram tests (" expectation tests written in a shell-like syntax").
Also less on topic as it’s not unit testing, property based testing as exposed by qcheck is really nice: if you have some properties in mind about your functions that you want to check, instead of picking some arbitrary input, let qcheck do it for you with many random inputs.
Is there a prevailing opinion on testing libraries in 2024? I tried using Ounit2 and liked it for its simplicity, but im wondering if its the right tool in 2024. I looked into alcotest but it requiring the user to define pp and equal functions for every type/module being tested seems very unappealing to me.
Should I just stick with Ounit2 or is there something better out there that wont require lots of boilerplate like alcotest?
If OUnit2 is working for your needs, there’s no reason to change just for the sake of change. However I have written up something about an alternative approach just in case you are interested: Bare-bones unit testing in OCaml with dune - DEV Community
This difference is superficial because OUnit also needs such functions if you want to test anything properly. Its assert_equal just uses polymorphic equality by default (which isn’t the right thing for most non-trivial types) and prints not equal when the test fails, instead of printing the values (because it simply cannot without a custom ~printer).
You could also use polymorphic equality and printing “not equal” for everything in Alcotest but that will hardly be useful or helpful.
This boilerplate is needed because OCaml isn’t like Java and JUnit, where every type has equals and toString methods which can be appropriately overridden and thus blindly used by the testing framework.
But doesn’t alcotest require these functions where one can use Ounit2 without them? It seems like with Ounit you can get a lot of work done without these functions and then optionally add them later for better error reporting. That’s a big deal to me or any newbie who wants to get things done without having to learn a whole bunch of conventions before making any progress.
Since the matter of inline testing came up, there also exists qtest, a preprocessor which allows to write inline tests (embedded in OCaml comments). You can write unit tests using ounit, and property tests using qcheck.
I’m surprised no one mentioned it yet, is there some community wisdom against it, excepted the argument made by RWO authors that you avoid inline tests as much as possible?
I appreciate the MDX ability to test the final executable. But when a test fails Mdx writes the whole .md file and I have to search which test failed. Ounit was more convenient in that topic because it stops at the first failed test and gives the name of the test.
But maybe I misuse MDX.
It’s been a long time since I messed with it (which means, time to go update some of my packages!) but I remember it generates a .corrected file, and you can diff that against the source to see what’s changed – that is to say, what the errors are.
Another option is Tezt. We use it extensively for Tezos. And indeed, we’d like to use it exclusively but we still have some legacy tests in other frameworks. You can find a blog post introducing tezt here.
This is also exactly what the Dune support for MDX does, you get a diff where it will display what is different from your expectations and you can also promote these changes to be your new expected values.
I tried cmp on the MDX output, cmp stops at the first difference with the expected .md. This is what I’m looking for. cmp just writes the line number but it should be enough for me.