I have just read this
I am curious what the behavior of the Ocaml Map
(and also Hashtbl
) module is, in this particular respect. (I know it would be relatively easy to try, so forgive my laziness here.)
–
Ian
I have just read this
I am curious what the behavior of the Ocaml Map
(and also Hashtbl
) module is, in this particular respect. (I know it would be relatively easy to try, so forgive my laziness here.)
–
Ian
Here is a session with OCaml 4.14.0. It’s definitely interesting to think about how to specify a reasonable behavior for nan.
# module FMap = Map.Make(Float);;
# let mymap = FMap.singleton nan "yes";;
val mymap : string FMap.t = <abstr>
# FMap.find nan mymap;;
- : string = "yes"
# nan = nan;;
- : bool = false
# FMap.cardinal (FMap.remove nan mymap);;
- : int = 0
The Map.Make
module uses the compare
function defined in the input argument. From the Float documentation:
compare x y
returns0
ifx
is equal toy
, a negative integer ifx
is less thany
, and a positive integer ifx
is greater thany
.compare
treatsnan
as equal to itself and less than any other float value. This treatment ofnan
ensures thatcompare
defines a total ordering relation.
Hashtbls also treat NaNs as equal:
# let tbl = Hashtbl.create 11;;
val tbl : ('_weak1, '_weak2) Hashtbl.t = <abstr>
# Hashtbl.add tbl nan 0;;
- : unit = ()
# Hashtbl.length tbl;;
- : int = 1
# Hashtbl.remove tbl nan;;
- : unit = ()
# Hashtbl.length tbl;;
- : int = 0
The polymorphic hashtbl uses the polymorphic compare
function to test for equality; you can read the source code.
As is usually the case, whenever I read about a flaw discovered in other languages and ecosystems, I check how OCaml does its equivalent and it already does the correct thing.
To me, nan
looks like an weird and ugly value, something that I wouldn’t want to deal with in my programs (I’d prefer getting exception from unexpected inputs).
For instance, this looks illogical to me.
utop # nan = nan;;
- : bool = false
utop # nan > nan;;
- : bool = false
utop # nan < nan;;
- : bool = false
I mainly program at a high level, and if I understand things correctly CPUs themselves may return nan
values when computing floating point operations (which seems reasonable).
I’m not sure I understand how that’d could be useful in my userland code though. Could anyone explain to me a use case when having a nan
value would in fact be useful?
One use case is representing the absence of data. Basically you can see the float
datatype as being a compact float option
.
Something to keep in mind is that float
does not model the mathematical concept of real number but rather implements IEEE 754, the de-facto standard for numeric computing. Doing something different here would make OCaml pretty much worthless for numeric computing.
Cheers,
Nicolas
NaN can also have multiple representations, e.g. both of these are NaN:
utop # Int64.float_of_bits 0x7ff8000000000002L;;
- : float = nan
utop # Int64.float_of_bits 0x7ff0000000000002L;;
- : float = nan
Different operations also produce different NaNs
utop # Float.nan |> Int64.bits_of_float |> Printf.sprintf "%Lx";;
- : string = "7ff0000000000001"
utop # sqrt ~-.1. |> Int64.bits_of_float |> Printf.sprintf "%Lx";;
- : string = "fff8000000000000"
utop # (Float.nan +. 3.) |> Int64.bits_of_float |> Printf.sprintf "%Lx";;
- : string = "7ff8000000000001"
More information here IEEE 754-1985 - Wikipedia and NaN - Wikipedia
In fact I’ve seen some C code that attempts to encode some debug information into the NaN to help tracing its origin. Just discovered that you can actually input such NaN with payloads in OCaml (they just don’t get printed):
top # Float.of_string "nan(42)";;
- : float = nan
top # Float.of_string "nan(42)" |> Int64.bits_of_float |> Printf.sprintf "%Lx";;
- : string = "7ff800000000002a"
utop # Float.of_string "-nan" |> Int64.bits_of_float |> Printf.sprintf "%Lx";;
- : string = "fff8000000000000"
The rabbit hole goes much deeper, as NaN-boxing is a technique that could completely replace the way OCaml does pointer tagging and allow us to represent both integers and floats as direct values. See the discussion here for example.
Although I knew about float’s lack of precision, I somehow failed to understand that.
Now that I understand that 0.0
doesn’t actually mean zero everything makes sens.
Thanks for insisting on this and the useful code snippets
Sorry for polluting your thread @nobrowser
No, it’s fascinating! TBH starting an interesting thread like this is one of my goals when I post. If it ranges wide so much the better!
–
Ian