Build systems and snapshot testing support

Kakadu · March 28, 2025, 9:06am

I recently tried to find how to implement dune-like cram testing for C++ project, and it looks like CMake is not mature enough to do this.

I’m curious, should I give Bazel a chance regarding this? Because, if it can’t do this too, I will probably never gonna seriously consider it…

mobileink · March 28, 2025, 2:05pm

What would make cram testing “dune-like”? In general Bazel can handle just about anything. Do you have a particular cram tool in mind?

mobileink · March 28, 2025, 2:19pm

It probably would not be terribly difficult to write a cram_test rule that runs cram and handles dependencies, paths etc. Is that what you mean by “dune-like”? It’s on my radar but I haven’t prioritized it.

mobileink · March 28, 2025, 7:03pm

I spent a few hours this morning looking into this. The good news is that yes, Bazel can do that. The bad news is that cram is outdated and possibly unmaintained - the github repo for it seems to have disappeared. That doesn’t mean it’s broken, it just means that I’m not at the moment not capable of integrating it. Bazel has excellent (and official) support for Python (see rules_python), but I’ve never used them and I’m not a python guy. (If you think opam is tough, just take a look - “wheels”, “pip”, “pypi”, … wtf?) An experienced Bazel+Python dev could probably make it work in about two minutes. Anyway, I’ll get to it eventually, it does seem like an important use case (albeit completely unrelated to OCaml).

HTH,

Gregg

henrytill · March 28, 2025, 8:52pm

The Project Description for the Python cram that is on PyPI links to a GitHub repo that redirects to GitHub - aiiie/cram: Functional tests for command line applications, which has been archived, but is still there.

Meta have a tool called scrut which is a cram-alike that supports the cram file format, which may be a better choice for circumstances where you might have used Python cram.

Kakadu · March 28, 2025, 8:53pm

Saying “dune-like cram” tool I meant exactly as it is done in dune. We write

  $ mine.exe -args1
  expected output 1
  $ mine.exe -args2
  expected output 2

We don’t write any glue code that should execute mine.exe with right arguments. (There are CMake based attempts with this glue, I don’t like it. Running Golden tests with CMake – musteresel's blog) Also I don’t want to check manually generated output with the expected so-called “golden” one.

mobileink · March 28, 2025, 10:03pm

Sorry, I don’t see what $ mine.exe -args1 etc. has to do with dune. Can you explain exactly what it is you want, without referring to dune?

mobileink · March 28, 2025, 10:07pm

“Archived but still there” does not inspire confidence, for me at least. But indeed there are many similar tools out there, like expect.

Kakadu · March 29, 2025, 7:33am

Basically, build system runs mine.exe -args1 in shell, and if the output differs from expected output 1, then build system shows a diff. After that everything repeats for mine.exe -args2, etc.
Also there is a command test --auto-promote which promotes the expected output.

gasche · March 29, 2025, 2:36pm

Other communities call this “snapshot testing” which is more precise and easier to understand than “cram-like”, as indeed that tool has disappeared. The idea is to write tests as observed interactions from a command-line, storing the expected output below each command (note: it would be nice to also record the error code and possibly distinguish stdout and stderr, some formats let users do that). To run the test means to run each command in a test environment (more on that below), and checking that the actual output matches the recorded output. There is a different run mode (called “promotion”) where instead of checking and failing in case of mismatch, the tool edits the original test file to change the output to match the observed outputs.

The test runner (the program running the test) needs to run the tested commands “in a test environment”, typically in a sandbox directory somewhere which is created by the test runner. The build-system description of the test must describe which executable of the project must be built before running the tests, and the test runner needs to know where to find them in the buildsystem directories. The configuration of the test will also typically indicate data files (test input data) that have to be imported in the test environment.

In summary, the cooperation between the build system and the test runner must ensure the following:

the necessary binary have been built and imported in the test environment
the input test data has been built and is imported in the test environment
these forms the two dependencies for the test results, so the test must be re-run if they change (and can be skipped if they have not changed since the last run)

dbuenzli · March 29, 2025, 3:17pm

…

Personally I wouldn’t tie the idea of snapshot testing to observing the behaviour of running command line programs. In fact what you describe would be known to me as cram testing :–)

I would rather characterize snapshot testing as:

Rendering the result of a computation to a human readable value,
Compare the rendering to a previously stored reference rendering.
Offering the ability to automatically update the stored reference value with a newly computed value.

Since I wanted to snapshot test cmdliner’s outputs I recently got into that, and I ended up snapshotting arbitrary OCaml values (as long as you can print them as an OCaml literal value) and the reference values are stored in the source itself. In fact in the end it totally blends (compare Test and Snap invocations) with simple assertion checking, it’s like like an assertion test that you can generate (given an initial dummy OCaml value of the right type) and update automatically.

The build system requirements in such a system is only to be able to build the executable.

gasche · March 29, 2025, 3:56pm

Good point, I described a subset of snapshot tests that could be called “command-line snapshot tests”.

Someone has to handle promotion as well. When build systems take the responsibility to run the excutables they built (typically outside the source directory), this may require some specialized machinery.

Aside: I like command-line snapshot testing for some projects because it nudges me into developing a pleasant command-line user interface (nice printers, etc.). But I would also like to be able to use snapshot testing of internal values without having to write a pretty-printers for them, as supported by the expect-tests implementation in the compiler or mdx. (Although in my experience the type-aware printers of the toplevel quickly become unpleasant to use in practice for semi-large values.) We wouldn’t necessarily need specialized machinery if one of the command-line testing tools also directly supported read-eval-print-loop programs.

(This may be an opportunity to provide a useful tool, implemented in OCaml, that would also be appreciated outside the OCaml community. My other idea in this area is a better version of hyperfine.)

dbuenzli · March 29, 2025, 4:18pm

In my case the test executable itself does that. So fundamentally nothing more is needed from the build system.

Now I do have some support in my build system to bulk operate test executables but it’s generally agnostic to the testing framework.

mobileink · March 29, 2025, 7:56pm

Maybe we should be looking at Scrut | Scrut (thanks @henrytill )

I note that cram is gpl licensed. Does dune use it? Heheh. Scrut is MIT licensed.

mobileink · March 29, 2025, 8:08pm

(And as an aside, Bazel supports a bunch of licensing stuff so you can make you haven’t stepped on the wrong toes. I’ve never needed it but there it is.

Release-Candidate · March 29, 2025, 8:15pm

This has nothing to do with the build system, neither CMake nor Bazel. What needs to be done is to get your test runner to compare the output of a program to one (or better two, one for stdout and stderr) and display the result of the diff on error. The other possibility is to store the results in the source itself, which Python and Haskell calls “doctest”, Go “examples”, …

You need a snapshot testing library which is compatible with your testing framework, so you should first tell us which of the C++ ones you are using with CMake (or Bazel).

One that works with many of them is GitHub - approvals/ApprovalTests.cpp: Native ApprovalTests for C++ on Linux, Mac and Windows, see the tutorial for an example ApprovalTests.cpp/doc/Tutorial.md at master · approvals/ApprovalTests.cpp · GitHub

Chet_Murthy · March 29, 2025, 9:38pm

Maybe I’m off-base, but I’ll just throw this out there:

If what you’re looking for is something like this (sorry, the file uses “```”, b/c it understand Markdown format) then isn’t mdx what you want? It supports expect-like testing for both OCaml toplevel and sh code. I use it for both all the time.

```sh
$ ../../src/LAUNCH echo foo
Failure("LAUNCH: environment variable TOP *must* be set to use this wrapper")
[1]

$ env TOP=../.. ../../src/LAUNCH echo foo
foo

$ env TOP=../.. ../../src/LAUNCH -- ocamlfind camlp5-buildscripts/LAUNCH -- echo bar
bar

smolkaj · March 30, 2025, 3:26am

Yet other communities call it diff testing or golden testing. +1 to @dbuenzli’s comment that this is very useful and common for testing libraries, too. If your library ever returns error messages, I’d highly recommend diff/golden testing to ensure your error messages are good.

Release-Candidate · March 30, 2025, 10:08am

I forgot to mention XMake, which is a C++ (and for other languages) build system which supports comparing the output of tests with its own test runner: XMake: Matching output results

I have never tried this snapshot testing stuff, but XMake itself is IMHO a better alternative to CMake (which isn’t that hard, to be fair).

Leonidas · March 31, 2025, 3:05pm

No. Dune implements its own version of it. Cram in my mind is more a specification for a type of testing, a bit like unit-testing is the concept and JUnit/XUnit are some implementations of it.

It also supports using MDX, which among other formats (e.g. OCaml toplevel) also supports cram-style tests using a shell. The syntax differs slightly, I believe this is mostly due to the cram format being not particularly well specified (something it shares with JSON as originally published).

Topic		Replies	Views
Cram - Tests on Short Notice Community blog , dune	9	3445	April 7, 2021
[ANN] "Cram tests: a hidden gem of dune" and "Snapshot tests for your own ppx" Community announce , dune	7	393	January 12, 2025
[ANN] Dune 2.7.0 Community announce , dune	6	1472	August 20, 2020
Dune Cram tests with several file outputs Ecosystem testing , dune	5	327	September 9, 2024
OBazl Toolsuite - tools for building OCaml with Bazel Ecosystem build , bazel	52	3911	March 31, 2025

Build systems and snapshot testing support

Related topics