Deriving.eq and Float.nan

Chet_Murthy · March 15, 2021, 12:17am

I’m writing some code that parses floats, and one of the things it can parse, is nan. In writing unit-tests for this code, I have need to write a value that contains Float.nan and compare it to the result of this parser. Obviously, since Float.nan is not equal to itself, the built-in equality won’t work. I have the equivalent of ppx_deriving.eq [which today just uses equality for floats, hence also returns false] and can modify it so that it checks if the first argument satisfies Float.is_nan and if so then check if the second argument is also Float.is_nan. Here’s my question:

Should I make this modification to the semantics of equality in pa_ppx.deriving_plugins.eq ? It seems like the purpose of these deriving plugins is for debugging, so that might be a good thing.
Maybe this special case should be driven by an option ? But lots of options is confusing, etc, etc.

So I’m looking for “advice about what the proper requirements for deriving.eq should be” really.

P.S. The code I’m writing is a YAML parser, and YAML can parse nan, hence the need to write unit-tests to test this capability.

yawaramin · March 15, 2021, 1:27am

Seems to work:

─( 21:25:55 )─< command 3 >───────────────────────────────────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # -.nan = nan;;
- : bool = false
─( 21:26:02 )─< command 4 >───────────────────────────────────────────────────────────────────────────────────────────────{ counter: 0 }─
utop # nan = nan;;
- : bool = false

Chet_Murthy · March 15, 2021, 1:41am

Yes, the behaviiour you cite is what I mean by “doesn’t work”. That is to say, if you’re debugging, and want to have an equality-function that works on ASTs that contain floating-point values, then you would find the cited behaviour to be … not the right behaviour, right?

Like I said, this is a “requirements” question: what should deriving.eq do for equality on floats? Should it hew to the builtin equality? Should be an option to escape?

Maybe even a more-general question: should the various built-in types and their derived code in the various type-derivers, be overridable via options in a systematic way? So that I could do the same with “int” – suppose I wanted to generate a type-deriver for a type with int in it, but I wanted the equality on ints to be ‘modulo 2’ or something like that. Maybe it might be nice to have that sort of customizability. [that’s probably going too far … but maybe this will tickle someone’s fancy and they’ll come up with a good reason for doing it.]

LdBeth · March 15, 2021, 2:04am

For values contain nan a possible workaround for equality test is marshal them to strings and compares the contents.

What I mean is, stick to the built in equality as long it’s a member of float.

But I do agree there’s great value to be able to specify “quotient type” in a systematic way.

yawaramin · March 15, 2021, 2:16am

Sorry, I misunderstood the requirement. But yeah, I would do what LdBeth suggested for the test: string_of_float actual = "nan".

Chet_Murthy · March 15, 2021, 2:50am

[OK, I swear, I’m not being obtuse here, not purposely uncooperative]

So, maybe some more details. My goal is to write a YAML parser. Before I do so, I’m faithfully transcribing every example (every last one) from the YAML 1.2 spec into unit-tests. After that, I"ll be getting every unit-test in the YAML test-suite working, then the JSON examples & test-suite (b/c YAML and JSON are close, and … the rest is details of the project).

Now, one of the examples (#2.20) is

canonical: 1.23015e+3
exponential: 12.3015e+02
fixed: 1230.15
negative infinity: -.inf
not a number: .NaN

and this is parsed into a YAML dictionary, where one of the values contains the nan. Now, sure I can -reserialize- the YAML object and do a string-compare, but I’d prefer to not rely on the correctness of a YAML pretty-printer (which, BTW, I also will be writing) in order to test the YAML parser.

Hence, my question about a custom deriving.eq.

P.S. Maybe a little more context. The plan isn’t to write a YAML parser, but rather, a parser for a data-language that is very close to YAML. Specifically, unlike YAML, this language will allow for its specification to be divided into a lexical and grammatical part – so that it can be written using standard tools (flex/bison, ocamllex/menhir), and without the lexer and parser having unholy relations under cover of darkness. So it would be good to be able to test the parser (demarshaller) and pretty-printer (demarshaller) completely separately.

P.P.S. it wasn’t much code to extend pa_ppx.deriving_plugins.eq with a new option nan_self_equal; my goal with this conversation is to figure out if there is something … “nicer” one might do.

copy · March 15, 2021, 3:14am

Perhaps I’m misunderstanding the question, but let me try:

compare x y = 0 is the standard way in OCaml to compare floats while considering NANs equal. From float.mli:

val compare: t -> t -> int
(** [compare x y] returns [0] if [x] is equal to [y], a negative integer if [x]
    is less than [y], and a positive integer if [x] is greater than
    [y]. [compare] treats [nan] as equal to itself and less than any other float
    value.  This treatment of [nan] ensures that [compare] defines a total
    ordering relation.  *)

Further, ppx_deriving, lets you specify custom equal functions on a per-field basis. From GitHub - ocaml-ppx/ppx_deriving: Type-driven code generation for OCaml

eq and ord allow to specify custom comparison functions for types to override default behavior. A comparator for type t has a type t → t → bool for eq or t → t → int for ord. If an ord comparator returns a value outside -1…1 range, the behavior is unspecified.
# type file = {
  name : string [@equal fun a b -> String.(lowercase a = lowercase b)];
  perm : int    [@compare fun a b -> compare b a]
} [@@deriving eq, ord];;

Putting the two together should result in a nice solution. The question is perhaps if ppx_deriving’s default equality should be changed, or its documentation should mention this edge-case.

c-cube · March 15, 2021, 3:26am

to compare floats in this way, I’d start with Stdlib.classify_float
and then an additional comparison of sign (for FP_zero/FP_infinite) or true
comparison (for FP_normal/FP_subnormal). For FP_nan there’s no further
checks.

Chet_Murthy · March 15, 2021, 3:27am

Oh, ha! I forgot about this! Thank you! Heh, I implemented deriving.eq (as part of pa_ppx, but forgot that this was possible!)

Of course, this solves the problem!

ETA: To follow-up on @copy 's solution, I could also have used @nobuiltin so I could specify my own equality function for the primitive type. I just … forgot these capabilities existed. Sigh. As a bard once sang: “what a drag it is, getting old.”

yawaramin · March 15, 2021, 3:39am

There’s also Float.is_nan.

psafont · March 15, 2021, 9:10am

It’s also usual to also account for an error in the representation when comparing floats. (Called epsilon). I needed something similar when running some unit tests and ended up with:

let isnan f = FP_nan = classify_float f

let same x y =
  (isnan x && isnan y)
  (* compare infinities *)
  || x = y
  || abs_float (x -. y) <= eps

jaxon · March 21, 2021, 5:03pm

Which is the proper way of comparing floats in computers, by define an epsilon distance where floats are consider equal. Like you did. That is why comparing floats directly for equality is consider bad practice.

Topic		Replies	Views
Assertions involving NaN Ecosystem testing	3	602	November 6, 2022
About maps and NaNs Learning stdlib	10	994	November 25, 2022
Ppx_deriving eq access to function Ecosystem ppx_deriving	3	984	December 26, 2018
Division by zero (newbie first question) Learning	15	3099	February 22, 2021
Is_positive, etc Learning	6	2678	August 25, 2017

Deriving.eq and Float.nan

Related topics