About maps and NaNs

I have just read this

blogpost about Golang

I am curious what the behavior of the Ocaml Map (and also Hashtbl) module is, in this particular respect. (I know it would be relatively easy to try, so forgive my laziness here.)


Ian

1 Like

Here is a session with OCaml 4.14.0. It’s definitely interesting to think about how to specify a reasonable behavior for nan.

# module FMap = Map.Make(Float);;
# let mymap = FMap.singleton nan "yes";;
val mymap : string FMap.t = <abstr>
# FMap.find nan mymap;;
- : string = "yes"
# nan = nan;;
- : bool = false
# FMap.cardinal (FMap.remove nan mymap);;
- : int = 0
2 Likes

The Map.Make module uses the compare function defined in the input argument. From the Float documentation:

compare x y returns 0 if x is equal to y , a negative integer if x is less than y , and a positive integer if x is greater than y . compare treats nan as equal to itself and less than any other float value. This treatment of nan ensures that compare defines a total ordering relation.

Hashtbls also treat NaNs as equal:

# let tbl = Hashtbl.create 11;;
val tbl : ('_weak1, '_weak2) Hashtbl.t = <abstr>
# Hashtbl.add tbl nan 0;;
- : unit = ()
# Hashtbl.length tbl;;
- : int = 1
# Hashtbl.remove tbl nan;;
- : unit = ()
# Hashtbl.length tbl;;
- : int = 0

The polymorphic hashtbl uses the polymorphic compare function to test for equality; you can read the source code.

2 Likes

As is usually the case, whenever I read about a flaw discovered in other languages and ecosystems, I check how OCaml does its equivalent and it already does the correct thing.

5 Likes

To me, nan looks like an weird and ugly value, something that I wouldn’t want to deal with in my programs (I’d prefer getting exception from unexpected inputs).

For instance, this looks illogical to me.

utop # nan = nan;;
- : bool = false
utop # nan > nan;;
- : bool = false
utop # nan < nan;;
- : bool = false

I mainly program at a high level, and if I understand things correctly CPUs themselves may return nan values when computing floating point operations (which seems reasonable).

I’m not sure I understand how that’d could be useful in my userland code though. Could anyone explain to me a use case when having a nan value would in fact be useful?

One use case is representing the absence of data. Basically you can see the float datatype as being a compact float option.

Something to keep in mind is that float does not model the mathematical concept of real number but rather implements IEEE 754, the de-facto standard for numeric computing. Doing something different here would make OCaml pretty much worthless for numeric computing.

Cheers,
Nicolas

2 Likes

NaN can also have multiple representations, e.g. both of these are NaN:

utop # Int64.float_of_bits 0x7ff8000000000002L;;
- : float = nan
utop # Int64.float_of_bits 0x7ff0000000000002L;;
- : float = nan

Different operations also produce different NaNs

utop # Float.nan |> Int64.bits_of_float |> Printf.sprintf "%Lx";;
- : string = "7ff0000000000001"
utop # sqrt ~-.1. |> Int64.bits_of_float |> Printf.sprintf "%Lx";;
- : string = "fff8000000000000"
utop # (Float.nan +. 3.) |> Int64.bits_of_float |> Printf.sprintf "%Lx";;
- : string = "7ff8000000000001"

More information here IEEE 754-1985 - Wikipedia and NaN - Wikipedia

In fact I’ve seen some C code that attempts to encode some debug information into the NaN to help tracing its origin. Just discovered that you can actually input such NaN with payloads in OCaml (they just don’t get printed):

top # Float.of_string "nan(42)";;
- : float = nan
top # Float.of_string "nan(42)" |> Int64.bits_of_float |> Printf.sprintf "%Lx";;
- : string = "7ff800000000002a"
utop # Float.of_string "-nan" |> Int64.bits_of_float |> Printf.sprintf "%Lx";;
- : string = "fff8000000000000"

2 Likes

The rabbit hole goes much deeper, as NaN-boxing is a technique that could completely replace the way OCaml does pointer tagging and allow us to represent both integers and floats as direct values. See the discussion here for example.

2 Likes

Although I knew about float’s lack of precision, I somehow failed to understand that.

Now that I understand that 0.0 doesn’t actually mean zero everything makes sens.

Thanks for insisting on this and the useful code snippets :slight_smile:

Sorry for polluting your thread @nobrowser

No, it’s fascinating! TBH starting an interesting thread like this is one of my goals when I post. If it ranges wide so much the better!


Ian