Dune Cram tests with several file outputs

amaro · August 9, 2024, 4:53pm

I want to test a script that produces 10 different files, and I wonder if Cram tests would be feasible, or if I should use something else.

My Cram test is basically $ run_script.exe <options>. It produces 10 different textual outputs, in predefined files. In theory I could add lines to my Cram test such as $ cat output1.txt, $ cat output2.json, etc, but I think it’s not a proper usage of the tool. Also, I’d like to avoid mixing all of the file contents in a single .t file.

I suppose I could copy the outputs produced in the sandboxed directory to the test directory and add them as oracles. So, for instance, manually do touch oracles/output1.txt, touch oracles/output2.json, etc, before running the test, and then add commands such as $ diff oracles/output1.txt output1.txt, etc, to the .t file. Is this proper usage of Cram tests? And is there a better way to do it?

sim642 · August 10, 2024, 8:33am

You don’t need to do the diffing with oracles (expected outputs) in a cram test, but can do it directly in dune as well: Writing and Running Tests — Dune documentation.
To run the script, you’d want a rule with the 10 different files as targets: rule — Dune documentation.

This way maybe has a bit more boilerplate than cram tests though.

mooreryan · August 14, 2024, 1:50pm

Sometimes if the output files are small I do as you say and simply write the output directly in the cram (.t) file (example). In simple cases, I have found it to be fine.

Another way you could use oracles is to use the so-called directory tests. You put anything you need for the test in the directory and then use it in the run.t file as you need to (e.g., you could use diff to compare outputs, like this).

I like the diffing method in dune rules as mentioned above as well, but generally I use the cram style as mentioned. (Probably because you only need to add (cram enable) to the dune-project file and then it’s ready to go.)

amaro · August 14, 2024, 6:27pm

Thanks for your insight and advice.

After reading your comments, it seems my ideal setup would be something intermediate between Cram tests and Dune rules. Indeed, I like the (cram enable) approach to avoid having to write any Dune rules at all. I guess it also makes it easier to manually re-run tests when things go wrong: I can just copy-paste the command directly from the run.t file.

However, my oracles are too big to be contained in the run.t file itself, so I’d like them to be in separate files. But without using custom Dune rules, my original approach wouldn’t work very well: even if I use a directory test, and add explicit calls to diff in run.t, it is not promote-compatible: any changes would result in a diff in the run.t file, so dune promote would modify that file instead of updating the output oracle file itself.

So, I guess I’ll have to keep using Dune rules for the tests. At least your comments confirm that I didn’t miss any obvious solutions.

amaro · August 23, 2024, 2:37pm

Well, I started writing my dune files for these test cases, and… it feels a bit like a sadder version of Makefile, in fact.

For each command, I have to write a rule (action (with-stdout-to test1.log) (run ./script.sh test1.txt)), plus (deps ./script.sh tool.exe test1.txt), then (targets test1.log test1.out1 test1.out2 test1.out3 ...). Then another rule for the test with a different script, and then the diff rules: (rule (alias runtest) (action (diff test1.log.expected test1.log)), (rule (alias runtest) (action (diff test1.out1.expected test1.out1)), etc…

Then, when adding a second test test2.log, I have to copy everything, renaming test1.* to test2.* everywhere. So I get hundreds of lines of code, with lots of copy-paste.

I’d honestly prefer to use a Makefile at this point, since at least I’d have access to lots of macros and special variables and incantations. They would make it harder to read for neophytes, but minimizing the amount of repetition would actually improve readability and the ease of adding a new test case.

It seems I’m doing something wrong, and that I’m missing a cleverer way to do this testing…

By the way, I know I can save some deps lines by using %{dep:/script.sh} in the action stanza, but I want to avoid them for a specific reason: whenever something goes wrong, and I have to re-run the test manually, one of the simplest ways to do so is to open the dune file, copy-paste the command from it, and run it in the terminal. Adding the %{dep:} syntax in the middle of the command line makes it impossible to do so. And because Dune outputs the diff command that failed, but not the command actually running the script, I cannot simply look at the terminal, see the original command, and copy-paste it to manually test it. Maybe there’s also a better way to do this…

amaro · September 9, 2024, 7:24am

Well, in the end I wrote a Python (but could be OCaml) script to help me keep using Cram tests with external outputs. If there is a more intelligent way to do this, please tell me.

The idea is:

I add diff commands to run.t; after dune runtest && dune promote, the diffs will be added directly to run.t itself as large blocks of >-prefixed lines;
I then run this re-promote script to extract these diffs to the oracle files themselves;
Finally, the re-promote script re-runs dune runtest && dune promote to get run.t to become pristine once again.

This requires running the tests at least twice (currently they are run three times, because re-promote does some extra checking), but saves writing any dune rules at all. Also, diff -N produces an output that is usable even if the oracle files do not exist yet, so I don’t have to manually touch them. In case someone might be interested…

#!/usr/bin/env python

from pathlib import Path
import re
import subprocess
import sys
import tempfile

if len(sys.argv) < 2:
    sys.exit(f"usage: {sys.argv[0]} testdir.t")

testdir = Path(sys.argv[1])
if not testdir.exists():
    sys.exit(f"error: test directory not found: {testdir}")
if not testdir.is_dir():
    sys.exit(f"error: not a test directory: {testdir}")
testfile = testdir / "run.t"
if not testfile.exists():
    sys.exit(f"error: test file not found: {testfile}")
build_target = testdir.parent / testdir.stem  # remove '.t' from directory name

# sanity check: run 'dune build @testdir' to see if oracles had been promoted,
# warn otherwise
proc = subprocess.run(["dune", "build", f"@{build_target}"], check=False, stderr=subprocess.DEVNULL)
if proc.returncode != 0:
    sys.exit(
        f"error: 'dune build @{build_target}' returned non-zero ({proc.returncode}). "
        + "Make sure to run 'dune promote' before running this script."
    )


def is_end_of_previous_oracle(line):
    if not line.startswith("  "):  # comment: previous oracle has finished
        return True
    if line.startswith("  $"):  # new command: previous oracle has finished
        return True
    return False


def is_start_of_new_diff(line):
    return line.startswith("  $ diff")


# we assume no path/filename contains spaces
re_diff_begin = re.compile(r"  \$ diff (-[a-zA-Z0-9] *)* ([^ ]*) ([^ ]*)")
patched_files = 0
with open(testfile, "r", encoding="utf-8") as f:
    diff_lines: list[str] = []
    dest = None
    collecting = False
    for line in f.readlines():
        if is_end_of_previous_oracle(line):
            assert not diff_lines, "non-empty diff lines must imply non-zero diff exit code"
            collecting = False
        if is_start_of_new_diff(line):
            collecting = True
            m = re_diff_begin.match(line)
            assert m, f"diff command not matching expected regex: {line.rstrip()}"
            dest = m.group(2)
        elif collecting:
            if line.rstrip() == "  [1]":
                # end of test oracle, diff returned non-zero
                collecting = False
                tmp = tempfile.NamedTemporaryFile(prefix="re-promote_", suffix=".diff")
                with open(tmp.name, "w", encoding="utf-8") as f:
                    for line in diff_lines:
                        f.write(line)
                    f.flush()
                assert dest, "dest must have been set"
                subprocess.check_output(["patch", dest, tmp.name], cwd=testdir)
                tmp.close()
                print(f"applied patches to: {testdir}/{dest}")
                patched_files += 1
                diff_lines = []
                continue
            diff_lines.append(line[2:])  # remove spaces added to Cram test oracle

assert (
    not diff_lines or not collecting
), f"file should have finished with either successful (empty) diff or a non-empty diff exit code. diff_lines: {diff_lines}, collecting: {collecting}"

print(f"re-promoted {patched_files} oracle(s).")
print(f"re-running 'dune build @{build_target} && dune promote'")
subprocess.run(
    ["dune", "build", f"@{build_target}"], check=False, stderr=subprocess.DEVNULL
)  # will fail if an update was expected
subprocess.check_output(["dune", "promote"])

Topic		Replies	Views
How to use helper scripts in cram tests with dune Ecosystem	2	601	August 23, 2021
Cram - Tests on Short Notice Community blog , dune	9	3289	April 7, 2021
Counting failing cram tests with dune Ecosystem	3	442	January 26, 2021
Make `dune runtest` explicitly show passing tests Learning flambda , dune	1	307	April 16, 2024
Updating files in a test Learning	7	511	March 10, 2023

Dune Cram tests with several file outputs

Related topics