I’ve recently been working with vectors (using immutable arrays). At several places in my code, I would like to ensure that the dimensions of my vectors match.
To me, the Invalid_argument seems to be the right exception to throw, because this exception (ideally) should never be raised when all code is written correctly.
However, as
I’m advised to ensure that this exception cannot be raised (because it shouldn’t be caught)
and the error text includes the function name
it feels like I end up with a lot of redundant checks in my code. Consider:
let add_vectors vec1 vec2 =
if Iarray.length vec1 <> Iarray.length vec2 then
invalid_arg "add_vectors: Tried to add vectors of different length";
Iarray.map2 Float.add vec1 vec2
let add_three_vectors vec1 vec2 vec3 =
let n = Iarray.length vec1 in
(* Should I do this check? *)
if Iarray.length vec2 <> n || Iarray.length vec3 <> n then
invalid_arg
"add_three_vectors: Tried to add three vectors of different \
length";
vec1 |> add_vectors vec2 |> add_vectors vec3
The length check in add_three_vectors is somewhat redundant, because nobody is supposed to add three vectors of different length and it gets checked in add_vectors anyway, but if I don’t perform the check in add_three_vectors, then a caller of add_three_vectors who doesn’t give proper input would see an error message from the inner function, which is somewhat confusing.
Is there some generic advice what to do?
Use my own exceptions and make each exposed function wrap these to their own Invalid_argument error message? This seems to be so much work.
Not care that much about passing (i.e. not catching) Invalid_argument exceptions from inner (but exposed) functions, even if this results in the most inner function being named in the error message?
I thought that’s rather an indication for Assert_failure? I thought Invalid_argument is for cases where application code might be supplying nonsensical but type-compatible input arguments. I may be wrong.
(* setter and getter for a nonnegative integer *)
let set, get =
(* invariant: !internal >= 0 *)
let internal = ref 0 in
(fun x ->
if x < 0 then invalid_arg "negative";
internal := x),
(fun () ->
let x = !internal in
assert (x >= 0);
x)
The idea is that you should raise Invalid_argument only if you are about to break some invariant of your code or if you are about to cause some undefined behavior. (And Assert_failure is raised when the internal state of your program has become inconsistent.)
In your add_three_vectors function, there is no reason to check anything, because there is nothing to break at that point. Sure, it means that the message of Invalid_argument will not be that useful to debug the issue, but you would have looked at the whole backtrace anyway, so who cares.
Assert_failure is for cases in which I as a programmer can ensure that the invariant is never violated. So, ideally, any function that I expose in an API should never raise an assertion failure (for whichever input). Exceptions of these type should generally not be caught.
Invalid_argument is for cases where the caller of my function should ensure that this exception is never raised. Also these exceptions should generally not be caught.
Failure can be used in cases that may happen under normal operating conditions (string to float conversion, for example). The exception should always be caught but the error message is only for diagnostic purposes and should not be required to match a specific string when catching the exception.
So when I have two functions add_vectors and add_three_vectors that take immutable arrays as arguments (with an expected equal length, which I can’t ensure though), and if those funcions are exposed in an API, then they should not raise an assertion failure on wrong input.
As it still makes no sense to ever input vectors of different size, I believe the right exception is Invalid_argument rather than Failure. Also compare List.iter2, for example, which also raises Invalid_argument when the length of the two lists given as arguments do not match.
So basically: Raise Invalid_argument before things go badly wrong, but we still know who’s fault it is (the caller of the function that raised Invalid_argument). And raise Assert_failure after things went wrong.
Following that advice would mean that I should not always ensure that Invalid_argument isn’t raised when calling a function? Does this hold generally? Isn’t this check superfluous too, then?
Afterall, Iarray.map2 doesn’t cause any undefined behavior on non-matching input dimensions but simply raises Invalid_argument. Does that mean I could or should simplify my code as follows:
What is the idiomatic or usual approach? I feel like the last version (without any checks done by me) makes error search more difficult (but it is fastest and most concise).
Is it perhaps a matter of a case-by-case decision when and where to do additional checks? Or should I always do as little checks as possible when I know that inner functions perform their own checks and live with inner error messages bubbling up?
If the function is part of the user-facing interface, then it might be worth producing proper error messages to inform the user of a misuse of the interface. If it is an internal function that is not meant to be called directly by the user, then there is not much point in making the code more complicated.
I have heard (or felt) that paradigm change before (in various contexts, not just OCaml), and I am curious if everyone sees it that way or if there is some reading material / references to read about that paradigm shift?
Also mentioned in the other thread:
Is this OCaml specific? Do we see a similar motion in other programming languages? I do know that Rust generally refrains from unwinding the stack (or catching an unwind, even though it’s possible) but rather proposes using a Result type (or also ControlFlow).
Is this related to exceptions not being declared in OCaml, hence being more dangerous than using option or result?
In Python, functions like float result in a ValueError rather than None:
>>> float("many")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: could not convert string to float: 'many'
In OCaml, I have seen both, and sometimes functions come in two flavors (at least in Stdlib).
I think using Failure can make things easier in some contexts when you have several places where things can “fail” and you just want to sum up all of them and report them as a failure in the end (using a single try … with without writing match statements for each function call). But I also felt more drawn to options.
Not only. It generally depends on what you want to see as exceptional or not (e.g. I don’t think of an empty list as being exceptional) and as an error or not (e.g. I don’t see an empty list as being an error in general).
But there’s not a one size fits all answer to this, it depends on the structures you operate on and your context (depending on which you may want to convert to one or another idiom, e.g. with Result.error_to_failure, Result.to_option, Option.to_result)
So I guess using exceptions in APIs is fine when assuming it may make things easier for the caller with regard to control flow (ability to use a single try to catch a Failure for various successive calls, instead of having to handle each potential failure).