Build systems and snapshot testing support

I recently tried to find how to implement dune-like cram testing for C++ project, and it looks like CMake is not mature enough to do this.

I’m curious, should I give Bazel a chance regarding this? Because, if it can’t do this too, I will probably never gonna seriously consider it…

What would make cram testing “dune-like”? In general Bazel can handle just about anything. Do you have a particular cram tool in mind?

It probably would not be terribly difficult to write a cram_test rule that runs cram and handles dependencies, paths etc. Is that what you mean by “dune-like”? It’s on my radar but I haven’t prioritized it.

I spent a few hours this morning looking into this. The good news is that yes, Bazel can do that. The bad news is that cram is outdated and possibly unmaintained - the github repo for it seems to have disappeared. That doesn’t mean it’s broken, it just means that I’m not at the moment not capable of integrating it. Bazel has excellent (and official) support for Python (see rules_python), but I’ve never used them and I’m not a python guy. (If you think opam is tough, just take a look - “wheels”, “pip”, “pypi”, … wtf?) An experienced Bazel+Python dev could probably make it work in about two minutes. Anyway, I’ll get to it eventually, it does seem like an important use case (albeit completely unrelated to OCaml).

HTH,

Gregg

The Project Description for the Python cram that is on PyPI links to a GitHub repo that redirects to GitHub - aiiie/cram: Functional tests for command line applications, which has been archived, but is still there.

Meta have a tool called scrut which is a cram-alike that supports the cram file format, which may be a better choice for circumstances where you might have used Python cram.

1 Like

Saying “dune-like cram” tool I meant exactly as it is done in dune. We write

  $ mine.exe -args1
  expected output 1
  $ mine.exe -args2
  expected output 2

We don’t write any glue code that should execute mine.exe with right arguments. (There are CMake based attempts with this glue, I don’t like it. Running Golden tests with CMake – musteresel's blog) Also I don’t want to check manually generated output with the expected so-called “golden” one.

Sorry, I don’t see what $ mine.exe -args1 etc. has to do with dune. Can you explain exactly what it is you want, without referring to dune?

“Archived but still there” does not inspire confidence, for me at least. But indeed there are many similar tools out there, like expect.

1 Like

Basically, build system runs mine.exe -args1 in shell, and if the output differs from expected output 1, then build system shows a diff. After that everything repeats for mine.exe -args2, etc.
Also there is a command test --auto-promote which promotes the expected output.

Other communities call this “snapshot testing” which is more precise and easier to understand than “cram-like”, as indeed that tool has disappeared. The idea is to write tests as observed interactions from a command-line, storing the expected output below each command (note: it would be nice to also record the error code and possibly distinguish stdout and stderr, some formats let users do that). To run the test means to run each command in a test environment (more on that below), and checking that the actual output matches the recorded output. There is a different run mode (called “promotion”) where instead of checking and failing in case of mismatch, the tool edits the original test file to change the output to match the observed outputs.

The test runner (the program running the test) needs to run the tested commands “in a test environment”, typically in a sandbox directory somewhere which is created by the test runner. The build-system description of the test must describe which executable of the project must be built before running the tests, and the test runner needs to know where to find them in the buildsystem directories. The configuration of the test will also typically indicate data files (test input data) that have to be imported in the test environment.

In summary, the cooperation between the build system and the test runner must ensure the following:

  • the necessary binary have been built and imported in the test environment
  • the input test data has been built and is imported in the test environment
  • these forms the two dependencies for the test results, so the test must be re-run if they change (and can be skipped if they have not changed since the last run)

Personally I wouldn’t tie the idea of snapshot testing to observing the behaviour of running command line programs. In fact what you describe would be known to me as cram testing :–)

I would rather characterize snapshot testing as:

  1. Rendering the result of a computation to a human readable value,
  2. Compare the rendering to a previously stored reference rendering.
  3. Offering the ability to automatically update the stored reference value with a newly computed value.

Since I wanted to snapshot test cmdliner’s outputs I recently got into that, and I ended up snapshotting arbitrary OCaml values (as long as you can print them as an OCaml literal value) and the reference values are stored in the source itself. In fact in the end it totally blends (compare Test and Snap invocations) with simple assertion checking, it’s like like an assertion test that you can generate (given an initial dummy OCaml value of the right type) and update automatically.

The build system requirements in such a system is only to be able to build the executable.

1 Like

Good point, I described a subset of snapshot tests that could be called “command-line snapshot tests”.

Someone has to handle promotion as well. When build systems take the responsibility to run the excutables they built (typically outside the source directory), this may require some specialized machinery.

Aside: I like command-line snapshot testing for some projects because it nudges me into developing a pleasant command-line user interface (nice printers, etc.). But I would also like to be able to use snapshot testing of internal values without having to write a pretty-printers for them, as supported by the expect-tests implementation in the compiler or mdx. (Although in my experience the type-aware printers of the toplevel quickly become unpleasant to use in practice for semi-large values.) We wouldn’t necessarily need specialized machinery if one of the command-line testing tools also directly supported read-eval-print-loop programs.

(This may be an opportunity to provide a useful tool, implemented in OCaml, that would also be appreciated outside the OCaml community. My other idea in this area is a better version of hyperfine.)

In my case the test executable itself does that. So fundamentally nothing more is needed from the build system.

Now I do have some support in my build system to bulk operate test executables but it’s generally agnostic to the testing framework.

Maybe we should be looking at Scrut | Scrut (thanks @henrytill )

I note that cram is gpl licensed. Does dune use it? Heheh. Scrut is MIT licensed.

(And as an aside, Bazel supports a bunch of licensing stuff so you can make you haven’t stepped on the wrong toes. I’ve never needed it but there it is.

This has nothing to do with the build system, neither CMake nor Bazel. What needs to be done is to get your test runner to compare the output of a program to one (or better two, one for stdout and stderr) and display the result of the diff on error. The other possibility is to store the results in the source itself, which Python and Haskell calls “doctest”, Go “examples”, …

You need a snapshot testing library which is compatible with your testing framework, so you should first tell us which of the C++ ones you are using with CMake (or Bazel).

One that works with many of them is GitHub - approvals/ApprovalTests.cpp: Native ApprovalTests for C++ on Linux, Mac and Windows, see the tutorial for an example ApprovalTests.cpp/doc/Tutorial.md at master · approvals/ApprovalTests.cpp · GitHub

Maybe I’m off-base, but I’ll just throw this out there:

If what you’re looking for is something like this (sorry, the file uses “```”, b/c it understand Markdown format) then isn’t mdx what you want? It supports expect-like testing for both OCaml toplevel and sh code. I use it for both all the time.

```sh
$ ../../src/LAUNCH echo foo
Failure("LAUNCH: environment variable TOP *must* be set to use this wrapper")
[1]
$ env TOP=../.. ../../src/LAUNCH echo foo
foo
$ env TOP=../.. ../../src/LAUNCH -- ocamlfind camlp5-buildscripts/LAUNCH -- echo bar
bar

Yet other communities call it diff testing or golden testing. +1 to @dbuenzli’s comment that this is very useful and common for testing libraries, too. If your library ever returns error messages, I’d highly recommend diff/golden testing to ensure your error messages are good.

I forgot to mention XMake, which is a C++ (and for other languages) build system which supports comparing the output of tests with its own test runner: XMake: Matching output results

I have never tried this snapshot testing stuff, but XMake itself is IMHO a better alternative to CMake (which isn’t that hard, to be fair).

No. Dune implements its own version of it. Cram in my mind is more a specification for a type of testing, a bit like unit-testing is the concept and JUnit/XUnit are some implementations of it.

It also supports using MDX, which among other formats (e.g. OCaml toplevel) also supports cram-style tests using a shell. The syntax differs slightly, I believe this is mostly due to the cram format being not particularly well specified (something it shares with JSON as originally published).