Working on a small standard-library overlay focused on total functions and I made integer division total. Is this too crazy?

Pretty much what the title says. Working on a standard library overlay, mainly for personal use, that tries to eliminate failure cases as much as possible. In addition to preferring options or results over exceptions, it also tries to define types that make failure cases impossible (e.g. non-empty lists), so it pushes the responsibility for error handling more in the direction of the input than the output (where results and options push it to the output).

One of the things that has always driven me crazy, not only about OCaml, but about programming in general is that integer division is not total. This shall not stand! We have the power! So I made a non-zero integer type and shadowed the division and modulo operators to only accept non-zero as a divisor. Is this nuts? Am I going to far? I just really like total functions.

type non_zero = private int
(** [non_zero] exists to make integer division total. You must convert
    [int] to [non_zero] before you divide with [/] or [mod]. In this way, we
    ensure division can never fail. *)

val nz : int -> non_zero option
(** ensure an integer is non-zero. total. *)

val nz_exn : int -> non_zero
(** ensure an integer is non-zero or raise an exception. *)

external unsafe_nz : int -> non_zero = "%identity"
(** "trust me that this integer cannot be zero" handle with care. *)

external nz_of_pos : Pos.t -> non_zero = "%identity"
(** If you're already dealing with a positive integer, we know it is not zero.
    total. *)

external int_of_nz : non_zero -> int = "%identity"
(** when you need the underlying [int] from [nz]. total. *)

external ( / ) : int -> non_zero -> int = "%divint"
(** shadows [Stdlib.( / )]. total. *)

external ( mod ) : int -> non_zero -> int = "%modint"
(** shadows [Stdlib.( mod )]. total. *)
2 Likes

It depends. It’s a balancing act and extremely contextual.

Sometimes insisting on making a function total and forcing yourself to return results on any argument can hide subtle bugs in user code. But sometimes they just do what you want.

For example I’m an extremely happy user of this function (which was sadly rejected by upstream). It has drastically cut down my off-by-one errors while extracting substrings (good defaults, both arguments in the indexing space and inclusive range). But I could see people objecting that not erroring on negative indices could hide a larger bug lurking in the code.

Perhaps a simpler example is what do to on List.take 3 l when the length of l smaller than 3. In some context you may not mind if l has less than three elements in others you’d rather error (though in this particular case it’s easy to match on the result to do this). And what should we do when n < 0 ?

Perhaps not in a given, restricted context where you need to be extra careful about that for some reason. But note that:

You just pushed the problem somewhere else… I’d say that in general it would be rather painful to work with what you propose here.

2 Likes

Another issue is overflow when doing division at the limits of a signed integer (due to two’s complement). The only way you can address that (without just silently rolling over or throwing) is to arbitrarily limit the range you allow when converting to non-zero. Otherwise you’re just giving a false sense of security. (On that note, I want to check now exactly what OCaml does when you divide int-min by (-1).)

(While I do agree that you’re just pushing the precondition checks to a different level, this type at lease forces it to be explicit, so I do think it’s an interesting exercise.)

1 Like

Yes, it pushes the problem somewhere else for sure, just like a non-empty list or a justified container would. The central idea I’m aiming at (and not necessarily hitting) is that all inputs should be validated before being passed around the program, and if a function takes an input which will be used as a divisor, the caller should ensure that it is non-zero further up the stack.

However, divsion is such a general operation that I fear this is an unrealistic goal, since divisors are likely to be generated much further down the stack in any case, so it wouldn’t make much difference most of the time.

I’ll try it out for a while and see how it goes, but I’m also not sure it’s a useful approach.

This is a very good point. If nothing else, I should at least document that it’s susceptible to overflow. I didn’t think about this because integer division normally brings the value closer to zero, but you’re right about Int.min_int / (-1). (the output is equal to min_int)

1 Like

What I suspect is that you are going to have say x - y |> nz_exn or unsafe_nz (x - y) all over you code base which is not going to be different from the occasional Division_by_zero throwing in your face :–)

This is my fear as well.

edit: on the other hand, I’m envisioning this being used with my non-empty list a lot (total/length, perhpas), and the length of the non-empty list is always Pos.t, so conversion has no error case in this scenario.

In fact for code that needs extra care about overflows and division by zero, I’m more convinced by providing additional “checked” operations which explicitely error on overflow or division by zero. Something like:

module Int.Checked : sig 
  val ( + ) : int -> int -> int option
  val ( - ) : int -> int -> int option
  val ( / ) : int -> int -> int option
  val ( * ) int -> int -> int option
end
4 Likes

I may try this if I don’t end up liking the current approach of pushing error handling up rather than down.

It isn’t insane to want a small library over the ints, that enforces some sort of invariant. I had a friend who was coding in C; he used C++ structs (wrappering ints) in order to enforce invariants. For instance, in his little type system, you couldn’t take two indices and add them (which operation is part of erroneously taking the midpoint of an array for binary search, viz. mid = (lo+hi)/2 that can produce overflow). Instead, his type system required that you -subtract- the two offsets, then divide that by 2, then add back. So mid = lo + (hi-lo)/2.

I’m forgetting all the details, but I do trust that he was doing this for good reasons, and that it actually did help him to ensure that his code was correct.

This was a C implementation of an IP-based SAN, so correctness mattered a lot, but so did performance (25yr ago, so every cycle matters, holey moley).

ETA: I should have added that of course, for production he would #ifdef away the wrapper, so that just in case the C++ compiler was being finicky, it was presented with -just- ints.

2 Likes

Some programming languages define n/0 ==> 0

Mathematicians will faint seeing this, but it may work in practice
(personally I have no experience working with any such languages)

You could also explore this possibility.

P.

1 Like

Pony is an example: Divide by Zero - Pony Tutorial

I actually contemplated this (I read discussions about division in Pony several years ago), but my instinct was that it was better to be correct than practical. :rofl: It could be that my instincts are bad. I’m also curious how n/0 = 0 would work in practice, though. It might be perfectly fine. :person_shrugging:

For everyone else: I added some equality witnesses as a compromise when you want to avoid the conversion to non-zero. My library has a couple other bounded integer types, so I actually defined one big equality witness for all of them so all my integer types be used interchangeably for cases where totality is less important.

Again I suspect it may end up hidding difficult to track bugs.

Note that you can still adopt yet another technique which is to augment the type with special values produced on errors and define every operation on these values (cue Float.nan).

Now this is not to say that sometimes it is not difficult to track the source of an unwanted NaN but at least you know something went wrong – theoretically signalling NaNs could help but I don’t think I ever heard of a system that uses them for bug tracking.

Not huge fan of NaN, though I understand why it’s there. You can do a lot of float operations with NaN and get quite far away from the source of the NaN before realizing there’s a problem. It somehow reminds of trying to debug JavaScript. While arguably better than n/0 = 0, an error type or even an exception is still more sane from my perspective. While my goal is to avoid error cases, it’s definitely not to allow errors to pass silently.

Which may not be a problem :–) In fact NaN is pretty useful to say None in your datasets.

Just to be clear. n /. 0. is not Float.nan it’s Float.infinity which has its own algebra in operations on float values.

1 Like

Just giving my 2 cents, I think this touches on the limit of a type system.

For instance, let’s say you’ve got a type representing only positive integers. You could then be tempted at some point (for whatever reason) to define in your program a “subtract” function that would take 2 positive integers and return a final positive integers (let’s imagine that this would make sens in your application domain → cannot/should never return a negative number).

Anyhow, this rule is not representable in the type system. Of course, you may implement type system tricks here and there if you wish but my point is that this is really a runtime concern, not a type system concern.

So for the same reason that sub : pos_int -> pos_int -> pos_int doesn’t make sens at the type level, dividing by zero doesn’t make sens a the type level either.

So personally, I prefer to have the program crash when encountering such unexpected conditions.

Also, I believe this is where the concept of “dependent types” comes into play. But I haven’t played with them yet, so my understanding is very limited. And I don’t think they either solve all of those problems or have a proven track record (still experimental).

1 Like

Yes. I have a positive integer module, and I define subtraction like this

val sub : Pos. t -> Pos.t -> int

—for exacly the reason you describe. If you want a positive integer at the end of it all, you have to convert again.

In a dependently typed language, you still can’t define a correct version of the the function

val sub : pos -> pos -> pos

Because that’s just how math works, but you could, probably define something like:

(forall a b: pos) where a > b -> (sub a b : pos)

To get the positive result, you would probably have to provide a proof that a > b.

We actually can simulate this in OCaml using phantom types and existentials (though we can’t prove a > b -> a - b > 0, like we can in Coq).

module M : sig
  type ('id, 'role) pos
  type greater
  type less

  (* we can use a GADT to force our type variable 'id to become an
     existential type *)
  type comparison =
    | Lt : (('id, less) pos * ('id, greater) pos) -> comparison
    | Gt : (('id, greater) pos * ('id, less) pos) -> comparison
    | Eq : comparison

  val compare : Pos.t -> Pos.t -> comparison

  (* we encode that subtraction only takes inputs with the same id--
     meaning they have been compared--where the first operand is
     greater than the second *)
  val sub : ('id, greater) pos -> ('id, less) pos -> Pos.t

end = struct
  type ('id, 'role) pos = Pos.t
  type greater
  type less

  type comparison =
    | Lt : (('id, less) pos * ('id, greater) pos) -> comparison
    | Gt : (('id, greater) pos * ('id, less) pos) -> comparison
    | Eq : comparison

  let compare a b =
    let n = Pos.compare a b in
    if n < 0 then Lt (a, b)
    else if n > 0 then Gt (a, b)
    else Eq

  let sub a b =
    Pos.(unsafe_coerce (a - b))
end

let () =
  (* bringing definitions of `one`, `+` and `show` into scope
     for Pos.t. Pos is part of my library *)
  let open Pos in
  match M.compare (one+one) one with
  | Gt (a, b) ->
    (* in the first case, we have the proof that a is greater
       than b, and subtraction is legal. *)
    print_endline (show (M.sub a b))
  | Lt (a, b) ->
    (* if a is not greater than b the same operation is
       a type error *)
    print_endline (show (M.sub a b));
    (* however, b - a is legal *)
    print_endline (show (M.sub b a));
  | Eq -> ()

The point of this exercise is that there are indeed ways to encode these kinds of constraints into our OCaml programs, but the process is kind of tautological. What we can’t do is prove that our implementations of compare and sub are correct, which we likely could with dependent types.

We could use the same strategy to create operations which make integer overflow a type error, or enforce many (any?) other arbitrary type constraints based on values. This would likely be much cleaner in a dependently typed language—I can’t speak much to that, since I am at the beginning of my dependent typing journey—but in any case, we can still express some pretty interesting kinds of constraints in OCaml with the type system as it is.

4 Likes

So in this post I’d like to clearly, formally lay out why it’s consistent to say that 1/0 = 0, why some of the common objections don’t apply, and what the real mathematicians say. Fair warning, this post is going to be a little more mathematically dense than my usual stuff. I’ve tried to make it clear but, well, math.

1 Like

Thanks for posting this. It’s very interesting!