Deriving.eq and Float.nan

I’m writing some code that parses floats, and one of the things it can parse, is nan. In writing unit-tests for this code, I have need to write a value that contains Float.nan and compare it to the result of this parser. Obviously, since Float.nan is not equal to itself, the built-in equality won’t work. I have the equivalent of ppx_deriving.eq [which today just uses equality for floats, hence also returns false] and can modify it so that it checks if the first argument satisfies Float.is_nan and if so then check if the second argument is also Float.is_nan. Here’s my question:

  1. Should I make this modification to the semantics of equality in pa_ppx.deriving_plugins.eq ? It seems like the purpose of these deriving plugins is for debugging, so that might be a good thing.

  2. Maybe this special case should be driven by an option ? But lots of options is confusing, etc, etc.

So I’m looking for “advice about what the proper requirements for deriving.eq should be” really.

P.S. The code I’m writing is a YAML parser, and YAML can parse nan, hence the need to write unit-tests to test this capability.

Seems to work:

─( 21:25:55 )─< command 3 >───────────────────────────────────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # -.nan = nan;;
- : bool = false
─( 21:26:02 )─< command 4 >───────────────────────────────────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # nan = nan;;
- : bool = false

Yes, the behaviiour you cite is what I mean by “doesn’t work”. That is to say, if you’re debugging, and want to have an equality-function that works on ASTs that contain floating-point values, then you would find the cited behaviour to be … not the right behaviour, right?

Like I said, this is a “requirements” question: what should deriving.eq do for equality on floats? Should it hew to the builtin equality? Should be an option to escape?

Maybe even a more-general question: should the various built-in types and their derived code in the various type-derivers, be overridable via options in a systematic way? So that I could do the same with “int” – suppose I wanted to generate a type-deriver for a type with int in it, but I wanted the equality on ints to be ‘modulo 2’ or something like that. Maybe it might be nice to have that sort of customizability. [that’s probably going too far … but maybe this will tickle someone’s fancy and they’ll come up with a good reason for doing it.]

For values contain nan a possible workaround for equality test is marshal them to strings and compares the contents.

What I mean is, stick to the built in equality as long it’s a member of float.

But I do agree there’s great value to be able to specify “quotient type” in a systematic way.

1 Like

Sorry, I misunderstood the requirement. But yeah, I would do what LdBeth suggested for the test: string_of_float actual = "nan".

[OK, I swear, I’m not being obtuse here, not purposely uncooperative]

So, maybe some more details. My goal is to write a YAML parser. Before I do so, I’m faithfully transcribing every example (every last one) from the YAML 1.2 spec into unit-tests. After that, I"ll be getting every unit-test in the YAML test-suite working, then the JSON examples & test-suite (b/c YAML and JSON are close, and … the rest is details of the project).

Now, one of the examples (#2.20) is

canonical: 1.23015e+3
exponential: 12.3015e+02
fixed: 1230.15
negative infinity: -.inf
not a number: .NaN

and this is parsed into a YAML dictionary, where one of the values contains the nan. Now, sure I can -reserialize- the YAML object and do a string-compare, but I’d prefer to not rely on the correctness of a YAML pretty-printer (which, BTW, I also will be writing) in order to test the YAML parser.

Hence, my question about a custom deriving.eq.

P.S. Maybe a little more context. The plan isn’t to write a YAML parser, but rather, a parser for a data-language that is very close to YAML. Specifically, unlike YAML, this language will allow for its specification to be divided into a lexical and grammatical part – so that it can be written using standard tools (flex/bison, ocamllex/menhir), and without the lexer and parser having unholy relations under cover of darkness. So it would be good to be able to test the parser (demarshaller) and pretty-printer (demarshaller) completely separately.

P.P.S. it wasn’t much code to extend pa_ppx.deriving_plugins.eq with a new option nan_self_equal; my goal with this conversation is to figure out if there is something … “nicer” one might do.

Perhaps I’m misunderstanding the question, but let me try:

compare x y = 0 is the standard way in OCaml to compare floats while considering NANs equal. From float.mli:

val compare: t -> t -> int
(** [compare x y] returns [0] if [x] is equal to [y], a negative integer if [x]
    is less than [y], and a positive integer if [x] is greater than
    [y]. [compare] treats [nan] as equal to itself and less than any other float
    value.  This treatment of [nan] ensures that [compare] defines a total
    ordering relation.  *)

Further, ppx_deriving, lets you specify custom equal functions on a per-field basis. From GitHub - ocaml-ppx/ppx_deriving: Type-driven code generation for OCaml

Putting the two together should result in a nice solution. The question is perhaps if ppx_deriving’s default equality should be changed, or its documentation should mention this edge-case.

3 Likes

to compare floats in this way, I’d start with Stdlib.classify_float
and then an additional comparison of sign (for FP_zero/FP_infinite) or true
comparison (for FP_normal/FP_subnormal). For FP_nan there’s no further
checks.

2 Likes

Oh, ha! I forgot about this! Thank you! Heh, I implemented deriving.eq (as part of pa_ppx, but forgot that this was possible!)

Of course, this solves the problem!

ETA: To follow-up on @copy 's solution, I could also have used @nobuiltin so I could specify my own equality function for the primitive type. I just … forgot these capabilities existed. Sigh. As a bard once sang: “what a drag it is, getting old.”

2 Likes

There’s also Float.is_nan.

2 Likes

It’s also usual to also account for an error in the representation when comparing floats. (Called epsilon). I needed something similar when running some unit tests and ended up with:

let isnan f = FP_nan = classify_float f

let same x y =
  (isnan x && isnan y)
  (* compare infinities *)
  || x = y
  || abs_float (x -. y) <= eps
2 Likes

Which is the proper way of comparing floats in computers, by define an epsilon distance where floats are consider equal. Like you did. That is why comparing floats directly for equality is consider bad practice.