Word diff in longer test_eq strings

olleharstedt · August 26, 2022, 6:49pm

Is there a way to make it easier to see the exact difference between two longer strings, when using inline tests?

[%test_eq: string] long_string second_long_string

Since the test run just spits out both strings, it’s hard to inspect what’s exactly different.

Maybe some way to run Unix diff on the two strings? Preferably without writing them to files first…

bcc32 · August 29, 2022, 2:42am

The patdiff package includes a library called Expect_test_patdiff which we use at Jane Street to print diffs in tests. This is usually used in conjunction with let%expect_test-style tests, hence the name.

yawaramin · August 29, 2022, 3:03am

It should actually be pretty easy to implement a test assertion function that prints the result to a snapshot file and outputs the git diff if the result differs from the existing snapshot. It’s hard to beat the readability of a git diff especially if you take some care to preserve colours when working in the terminal and discard them when working in CI (and assuming the output is rendered with appropriate line breaks).

I’ve actually implemented this in a (Scala) project at work and it works really nicely. Of course the caveat is that you have to require git to be available on the machine. But that shouldn’t be too difficult nowadays.

EDIT: I just ported over my implementation from Scala to OCaml pretty much unchanged except generalizing it to use OCaml value formatting capabilities:

exception Snapshot_error of string

let gulp path =
  let inc = open_in path in
  let finally () = close_in inc in
  Fun.protect ~finally begin fun () ->
    let len = in_channel_length inc in
    really_input_string inc len
  end

let write_snapshot path contents =
  let outc = open_out path in
  let finally () = close_out outc in
  Fun.protect ~finally (fun () -> output_string outc contents)

let fail_snapshot path contents =
  write_snapshot path contents;
  print_endline "\nSnapshot test failed:";
  ignore (Unix.system ("git diff --color " ^ path));
  ignore (Unix.system ("git checkout -- " ^ path));
  print_endline "\nTo update, delete the snapshot file and rerun the test.";
  raise (Snapshot_error path)

let assert_snapshot path pp new_val =
  let snapshot_exists = Sys.file_exists path in
  let is_ci = Sys.getenv_opt "CI" in
  let new_str = Format.asprintf "%a\n" pp new_val in
  match snapshot_exists, is_ci with
  | false, Some "true" -> raise (Snapshot_error path)
  | false, _ -> write_snapshot path new_str
  | true, _ -> if gulp path <> new_str then fail_snapshot path new_str

(* Usage: assert_snapshot "path/to/file.snapshot" pp_int (1 + 1) *)

olleharstedt · August 29, 2022, 9:16am

Wow, thanks guys! Will look at this later.

olleharstedt · January 17, 2023, 6:33pm

Hm how do I install and include this in opam/dune and use it…?

opam install patdiff

Meh, lib does not include any simple usage example.

(libraries base stdio str re dolog expect_test_helpers_kernel oseq patdiff)

Hm this does not work

Error: Library "patdiff" not found.

But patience_diff does work. Weird.

Still, without any examples to look at… Hm, can’t find any link to relevant github repo either. Would be nice to look at test files for usage examples.

Error: Unbound module Patdiff_kernel

I wonder if this is what hell feels like… Well I guess I can just iter the strings and die at any diff to make debugging easier.

Ehm, did string access change syntax recently-ish? Two Rosetta code examples are not compiling anymore.

bcc32 · January 17, 2023, 7:40pm

Sorry you’re having trouble: it’s pretty surprising the patdiff library wasn’t found (the dune file in that repo looks correct to me). What version of the patdiff package do you have installed, and have you double checked that you’re in the same opam switch where you installed the package?

(FYI, patience_diff is a dependency of patdiff_kernel (i.e., patdiff_kernel depends on it) so using it wouldn’t give you access to patdiff_kernel.)

I think the String.set syntax was removed recently: https://github.com/ocaml/ocaml/pull/11345

olleharstedt · January 17, 2023, 7:44pm

opam list:

patdiff                     v0.13.0     File Diff using the Patience Diff algorithm
patience_diff               v0.13.0     Diff library using Bram Cohen's patience diff algorithm

bcc32 · January 17, 2023, 8:01pm

Hmm, v0.13.0 is from quite a while ago. Unfortunately the library wasn’t really exposed correctly in that version (public_name was set to patdiff.lib, so the library is just unhelpfully called Lib).

Are you able to upgrade to a newer version? (Currently at v0.15.0).

hyphenrf · January 19, 2023, 12:58pm

could you link the page? I’ll fix it

olleharstedt · January 20, 2023, 10:12am

The hero we don’t deserve. https://rosettacode.org/wiki/Levenshtein_distance#OCaml

hyphenrf · January 21, 2023, 2:38pm

There’s no use of the now-defunct .[]<- in this code, it works as expected for OCaml 5.0
What’s the error you’re seeing?

olleharstedt · January 21, 2023, 11:28pm

let levenshtein s t =
   let rec dist i j = match (i,j) with
      | (i,0) -> i
      | (0,j) -> j
      | (i,j) ->
         if s.[i-1] = t.[j-1] then dist (i-1) (j-1)
         else let d1, d2, d3 = dist (i-1) j, dist i (j-1), dist (i-1) (j-1) in
         1 + min d1 (min d2 d3)
   in
   dist (String.length s) (String.length t)

11 |          if s.[i-1] = t.[j-1] then dist (i-1) (j-1)
                 ^^^^^^^
Error: This expression has type char but an expression was expected of type
     int

OK, so not a syntax error, then. Using OCaml 4.08.1 here.

hyphenrf · January 22, 2023, 2:41am

I diagnose you with a JSL infection (janestreet standard library). Fear not, the most common remedy is simply removing an open from your nearest .ocamlnit or init.ml

bcc32 · January 22, 2023, 3:03am

There is a good blog post (linked) explaining why Jane Street code conventions strongly avoid polymorphic comparison functions. Use Char.equal instead of =, and that would fix the type error.

hyphenrf · January 22, 2023, 4:00am

Although I recognize the magic nature of poly compare, and recognize the need to make users of the language aware of the general complexity of doing equality right sooner than later, and although I know for sure it will come and bite me one day, when I have some custom block with no compare callback, or a function, or Set.Make(M).t, nested deeply in two almost-identical complex data structures… I think it’s a little annoying having to pay for those gotchas with my everyday code. I bet the crushing majority of structural equality tests out there are done on basic types anyway. And what of the so-called perils above apply to char equality, really?

It really is just a matter of whether you think it’s better to pay for these problems upfront or not, and I’m firmly on the camp of not paying for something until I actually need it.

hyphenrf · January 22, 2023, 4:19pm

Also check out Equality Is Hard, and the following discussions on Reddit, Lobsters, HN… I’ve found them useful.

olleharstedt · January 22, 2023, 9:08pm

I could upgrade to 0.14 and build my project, but still not sure how to use it. Looked at the examples in the test folders, but yeah. Maybe too complex for my need.

Chet_Murthy · January 22, 2023, 9:23pm

I’m assuming you’re using dune ? I use Makefiles, and reverse-engineered what the ppx_inline_test PPC rewriter and test-runner stuff did, then wrote my own that use “git diff” for differences. Seemed to work fine. You can see it here: pa_ppx/Makefile at 8d05d50f0d5aa489d4990b6bb8a898e663d59858 · camlp5/pa_ppx · GitHub
I fear it’s all Greek, b/c unless you actually run ppx_inline_test from Makefiles, all of this complexity is hidden away by Dune. But I figured I’d note it just in case.

Topic		Replies	Views
Proper way to configure the diff per expect test Ecosystem testing	0	1061	September 8, 2018
Concise test output from PPX_expect? Learning	5	139	September 6, 2024
Testing with ppx_expect and ppx_deriving.show Learning testing , format	2	1367	May 24, 2018
This is my first OCaml program. Advices for improvement? Learning unix , string , beginner , clipboard	3	476	March 21, 2023
【Q】fd, channel, flush & buffer Learning string , text-processing , channels , file , buffer	1	736	March 21, 2023

Word diff in longer test_eq strings

Related topics