Cram - Tests on Short Notice

Does it seem like writing unit tests is endlessly boring and time consuming? Does your test suite require constant attention and tweaking, and yet the stream of bugs never seems to end? I feel the same way. Moreover, if I would knew how much time I would spend writing unit tests, I most likely would have picked a different profession. There has to be a better way.

In this post, I’d like to share one better way for testing binaries. A way that allows you to add new test cases in seconds, avoid writing manual assertions, and makes it easy to write self documenting tests by non technical users. The catch? You must upgrade to dune 2.7 and enable the new cram extension:

$ cat dune-project
(lang dune 2.7)
(cram enable)

Now dune will treat every file and directory that ends with .t as a cram test.

Let’s create a trivial test wc.t to test the word counting utility:

Test the behavior of wc.t. Any line that doesn't start with 2 spaces
is a comment (like this one).

Next, we’ll create a sample file to feed to wc:

Note the two spaces before the command:
  $ cat >sample.txt <<EOF
  > a
  > b
  > c
  > EOF

The command above creates a file with 3 lines. Note the leading 2 spaces and the $ denoting a command. We also use the heredoc syntax to pipe multiple lines to cat.

We’ll finally write a test that makes sure that wc works.

Count the lines:
  $ wc -l sample.txt

Note how we didn’t mention the expected output anywhere. This is where secret sauce comes in. We just run the test with dune:

$ dune runtest
 |  $ wc -l sample.txt
+|         3 sample.txt

And dune is helpful enough to fill in the output for us after promoting:

$ dune promote
$ dune runtest # now the tests pass

If we modify the wc utility to give a different result, the test will now fail because the command produced a different output. This style of testing is called expectation (or snapshot) testing. Here this style is dressed up in a shell like syntax to give us the cram test.

Dune 2.7 offers full support for this style, and we recomend it to all users. Do cram tests scale? In the dune project, this is our main testing mechanism and we have over 200 cram tests in our test suite. We use this approach to test and document both new features and regressions. So far we’ve been very satisfied with this approach, and we’re happy to share it with our users.

As usual, there’s far too much to describe in a single blog post. The rest is thoroughly documented in our manual

I look forward to answer any questions you might have about cram.


What would be the canonical way to express as a cram test a run of a binary that is expected to fail (i.e. non zero exit code), while also comparing the output (e.g. for testing error reporting of a compiler-like program).

Cram has a [ .. ] syntax for expecting a particular exit code:

  $ dune build cycle.exe
  Error: Dependency cycle detected between the following libraries:
     "a" in _build/default
  -> "b" in _build/default
  -> "c" in _build/default
  -> "a" in _build/default
  -> required by library "c" in _build/default
  -> required by executable cycle in dune:17

(Example taken from Dune here.)

If the error code isn’t as expected, it’s shown in the diff:

 |Cycle detection
 |  $ dune build cycle.exe
 |  Error: Dependency cycle detected between the following libraries:
 |     "a" in _build/default
 |  -> "b" in _build/default
 |  -> "c" in _build/default
 |  -> "a" in _build/default
 |  -> required by library "c" in _build/default
 |  -> required by executable cycle in dune:17
-|  [2]
+|  [1]

Great ! Thanks a lot.
It might be useful to link to the cram test format specification at one point in the dune manual, to help discover such tricks.

I had been using cram cli (the python one) and just realized that dune can run cram tests with dune runtest. I’m having a problem in that it seems that the (glob) and (re) keywords from the normal cram CLI don’t work with the Dune cram tests.

Just to check I searched through the cram tests in the dune git repo (here: dune/test/blackbox-tests/test-cases) and didn’t see any use of (glob) or (re). Are these not available when running cram tests with dune?

We do not support them. @jeremiedimino does not like them so perhaps he could explain why.

That’s interesting. I am now rethinking if the way I was using cram is maybe not the ideal way. I’m assuming if the support was specifically not included there is a good reason to not use it.

If you don’t mind me asking, say you had a CLI app that output some files that had timestamps or something. I was using cram to check that the files actually get made something like:

  $ make_blah thing && ls new_file_thing_*
  new_file_thing_*.txt (glob)

Using the glob there since the timestamp in the file changes every time the test is run. Just wondering how you might approach about testing that sort of thing without the glob or re features.

The way we handle it in dune is to redact the non deterministic parts:

  $ make_blah thing && ls new_file_thing_* | sed 's/_[0-9]+/<redacted>/'

The story is as follow: several years ago, we discovered python cram tests at Jane Street. We tried them and liked them, decided to port the concept to our favourite language and wrote ppx_expect.

We then started using expectation tests everywhere in our code base. Since ppx_expect was inspired by python cram tests, it also supported globs and regexps and we used these in a lot of places for the reason @mooreryan mentioned: to cope with tests that had a non-reproducible output. And sometimes also just to hide the parts of the output that were not interesting.

Once we had written a lot of expectation tests and we started working and updating existing ones, we realised that globs and regexps break the usual workflow of expectation tests. Indeed, the usual workflow is:

  • edit → save → run → accept correction

However, when you use globs and regexps, the workflow becomes:

  • edit → save → run → accept correction → edit expectation

Not only the last step has to be done manually, but it has to be repeated. Indeed, as soon as your output is no longer matched, the carefully crafted globs and regexps are simply lost and you have to write them all over again. This workflow is not pleasant, especially when you have a lot of tests to work with. So instead, we settled for the method @rgrinberg described: post-processing the output. Contrary to globs and regexps, the post-processing is part of the code so it is never lost when you accept a correction. Which means that you get back the nice edit -> save -> run -> accept correction workflow. We have happily been using this method for several years and never regretted moving away from globs and regexps.

Dune cram tests where developed after all this, so I pushed for not including support for globs and regexps.


Thanks for the great explanation! That makes a lot of sense especially when considering that the globs and regexes break the edit, save, run, accept loop.