Consider the following two match … with statements (in f1 and f2) respectively:
let g = function x when x < 0 -> -1 | _ -> 1
let f1 x =
let s = g x in
string_of_int (s * x)
^
match[@warning "-partial-match"] s with
| -1 -> " (negative)"
| 1 -> " (positive)"
let f2 x =
let s = g x in
string_of_int (s * x)
^
match s with
| -1 -> " (negative)"
| 1 -> " (positive)"
| _ -> assert false
let f3 x =
let s = g x in
string_of_int (s * x)
^
match s with
| -1 -> " (negative)"
| 1 -> " (positive)"
| _ -> failwith "f3: Internal match failure"
let () =
print_endline (f1 4);
print_endline (f2 4);
print_endline (f3 4);
print_endline (f1 (-13));
print_endline (f2 (-13));
print_endline (f3 (-13))
I know that s will either be -1 or 1, so I don’t need an exhaustive match. But I would like to avoid a warning in those cases without globally disabling the partial-match warning. I wonder: what’s the better style to achieve this?
f1 uses an attribute. It looks somwhat ugly to me. But it has the advantage that I get a proper Match_failure exception when my assumption was violated.
f2 feels much easier to read, but the error triggered will be Assert_failure, which seems semantically fitting, but not sure. I dislike that my “warning disabling” changes the actual error from Match_failure to Assert_failure.
f3 I like least, because the Failure doesn’t contain the file name and line and column number in the raised exception.
What is your favorite style to deal with this, or the idiomatic/recommended style? f1, f2, f3, or yet another approach I’m not aware of?
type sign = Negative | Nonnegative
let g x = if x < 0 then Negative else Nonnegative
let f4 x =
...
match s with
| Negative -> "(negative)"
| Nonnegative -> "(positive)"
if that’s really not an option for you then
let f5 x =
...
match s with
| -1 -> "(negative)"
| 1 -> "(positive)"
| s -> failwith ("f5: g returned non-sign result: " ^ Int.to_string s)
In my use case I actually deal with a function that returns -1, 0, or 1 that resembles the sign of a number, which seems to be quite idiomatic (compare Q.sign) for example. In some cases, I know the value can’t be 0 (e.g. because I previously have to divide by that number). Also, I multiply several values of that kind, so that comes in handy too: sign x * sign y * sign z. Maybe I should use an enum and a custom operator instead, but not sure.
That is a bit of extra work having to provide explanations for every case where I would prefer to actually save some effort by not catching all cases.
I wished there was an easier way to sort-of “assert” that a partial-pattern match is exhaustive.
I normally use assert false. The precise exception is is not very important to me, since it is not supposed to ever happen. If I made a mistake and it triggers, all I am interested in is a proper backtrace, which any exception will provide.
Re-thinking about it, maybe assert false is the right thing to use, because I would generally use that when I run into a branch / piece of code that should never be reached. Also, the manual explicitly mentions using assert false in match branches:
As a special case, assert false is reduced to raise (Assert_failure ...), which gives it a polymorphic type. This means that it can be used in place of any expression (for example as a branch of any pattern-matching).
I found another source comparing assert false and failwith …: The docs on error handling, section “Assertions”:
Writing assert false would just stop your program. This idiom is sometimes used to indicate dead code, parts of the program that must be written (often for type checking or pattern matching completeness) but are unreachable at run time.
[…]
When the execution reaches conditions which can’t be handled, the right thing to do is to throw a Failure, by calling failwith "error message". Assertions aren’t meant to handle those cases.
(Side note: I think maybe it should read “unreachable code”, not “dead code” above.)
So to my understanding it’s like this now:
Use assert false for unreachable code, i.e. code that should (if the program is written correctly) never be reached.
Use failwith … for conditions that can’t be handled, e.g. not-yet-implemented requests.
The difference would be: assert false shall never be reached (and in theory the compiler could even use that for optimization when assertion checks are disabled, even though OCaml doesn’t do that). In contrast, failwith …, is more like a generic run-time exception. Something that can actually happen and needs to raise an error.
I would probably consider using failwith … when I depend on behavior of a third-party library. I.e. in the example of my original post, if function g was a library function, then this seems to be a nice choice:
But if g and f are part of the same code section, maybe even in the same function, then assert false seems to be the better choice to me.
That’s what I often did in other programming languages; I guess it’s a valid option. But the intent to the reader is perhaps less clear. Let me share the real-world code I’m having:
match min_or_max with
| -1 -> if not Ops.(bound' >= !bound) then bound := bound'
| 1 -> if not Ops.(bound' <= !bound) then bound := bound'
| _ -> assert false)
I could make this:
if min_or_max < 0 (
if not Ops.(bound' >= !bound) then bound := bound'
) else (
if not Ops.(bound' <= !bound) then bound := bound'
)
But I feel like the intent is less clear in the second case.
AFAIU, If using something with the semantics of compare to get min_or_max, then the correct way to process the result is to use an inequality because
compare x y returns 0 if x is equal to y, a negative integer if x is less than y, and a positive integer if x is greater than y
I.e., checking only for -1 or 1 does not cover the specified range of possible values, but using inequalities to check for positive or negative values does.
However for the function g given as an example here, or for Q.sign, the range only includes 3 elements: 0, -1, or 1, and so the explicit check is arguably correct. If you are going to handle these kinds of results a lot, I then would agree with Partial match: Style advice - #2 by welltypedwitch that reflecting this in the type is preferable. (E.g., via a converter with a type like int -> [Pos | Neg | Zero ]`).
If it’s a one-off, then the explicit explanation in the failwith is not onerous, and it works to document why/how the contract is expected to hold for readers, so well worth the “extra work”, IMO
It seems like a fourth solution would be to combine f3 with a location. That way, you can emit a meaningful error-message (for the non-code-cogniscent user), while also providing code location. There’s a PPX rewriter, “ppx_here” that provides an extension (IIRC) [%here] that expands to the location, which can then be used in formatting the string for the exception. With the aid of a little helper code, one could write something not much more complex than the code of f3, that used that location.
Thanks again for all your responses. I really appreciate.
There are actually two different questions discussed:
How do I write down a partial match?
How do I handle cases where an int is (ab)used to denote some “cases”? And is it okay to use an int for that?
Regarding first question:
IMO the first question has no clear answer. I see the following ideas so far:
disable the warning for partial matches
always use _ -> failwith … (possibly with a macro to include the source code location)
always use _ -> assert false
use failwith … if the failure can happen and assert false if it’s impossible that the failure happens (though what is “impossible” could be argued about)
avoid the partial match
Currently, my favorite is to use either failwith … or assert false depending on how “impossible” the condition is (as I suggested above).
Maybe the question also depends on what I actually match, which is why discussing the (ab)use of int as an enum is interesting here.
Regarding second question:
I wonder if the return type of Repr.compare is a good choice from a semantic point of view. It feels more like a pragmatic hack. It allows to catch certain cases syntactically easily with if compare x y <= 0, for example. But note it doesn’t allow things like compare a b * compare c d, because compare could return min_int and min_int * min_int = 0.
Nonetheless, using int as a comparison result seems to be idiomatic in OCaml, and I don’t believe this would change soon?
Indeed, Q.sign (unlike compare) guarantees that only one of three values is returned. I think also here it’s a pragmatic choice. I can (theoretically) do something like:
#require "zarith";;
let abs_q x = Q.mul (Q.of_int (Q.sign x)) x;;
(Even though this might not be most efficient.)
So in my case, I use the -1, 0, 1 representation because I follow what the zarith library is doing. And it allows me to write min_or_max * Q.sign x. Still, it might not be the most clean choice.
There is a contradiction here. The precise exception is very important because your context may catch an exception because other functions are using them to report non fatal errors (e.g. Failure). So using, e.g. failwith you may never get that proper backtrace as the context may swallow it.
If you stick to the predefined exceptions, you are basically left with Assert_failure and Invalid_argument which are designed not to be catched by user code.
(Personally I just use assert false, it’s a strong signifier that this should not have happened)
Indeed, good catch! This is why one should use assert false (or Invalid_argument). When I wrote my reply I was thinking about backtraces and I forgot about this very important point.
Is this mentioned somewhere in the docs or do they behave differently? I assume Match_failure falls into this category too? Why exactly are those meant to not be caught? Is this common practice, a consequence following by design, or is there some other reason?
Is this because Assert_failure and Invalid_argument indicate an errorneous program where catching the exception may be dangerous, e.g. because some global state might be poisoned?
I mean: Why would a server process, for example, not catch when these are raised in a handler or worker if high-availability is demanded?
An Assert_failure _ means that an unreachable code path has been reached, thus the state of your program is compromised. Catching specifically this exception reflects a contradiction in your mind map of your code base: if you write an exception handler for Assert_failure _ because you believe that the assertion may be false sometimes, you should use another error handling mechanism. (It still might makes sense to log the assertion failure properly in an user-facing service.)
The Invalid_argument exception means that the user of a function failed to satisfy some easily-checkable preconditions of the function. Rather than catching the exception, you should check the preconditions. In other words, you shouldn’t write
try x.(n) with Invalid_argument _ -> ...
but
if n < 0 || n >= Array.length x then ...
Finally, many people consider than the non-exhautiveness pattern warning is better promoted to a compiler error (with -w @8). Moreover, the issue with disabling the warning locally is that you can no longer narrows the case that are impossible. Compare for instance
type t = A | B | C
let remove_A = function A -> B | x -> x
let f x = match[@warning "-partial-match"] remove_A x with
| B -> ()
(* someone added the constructor C after that the function was written,
and no one saw that the `C` case was missing *)
with
let f x = match remove_A x with
| A ->
(* `remove_A makes this case impossible *)
assert false
| B -> ()
Warning 8 [partial-match]: this pattern-matching is not exhaustive.
Here is an example of a case that is not matched: C
In the second case, the assertion is both narrow and the assert false branch is a good place to comment why the assertion is always true.
I mean: Why would a server process, for example, not catch when these are raised in a handler or worker if high-availability is demanded
If you can tear down part of the program and restores the state of the service in case of exceptions, you should catch all exceptions (but can you really recover from an Out_of_memory or Stack_overflow exception?) rather than single out “someone has made a programming error” exceptions.
The advice about not catching Assert_failure, Match_failure, Invalid_argument is that those exceptions denote programming errors that should be statically avoidable. And thus to preserve the status of those exceptions as panic exceptions, you should not trigger knowingly those exceptions while simultaneously correcting them at runtime with an exception handler.
It’s mentioned in the docs I linked to. There’s nothing special about them, it’s a convention. Both Assert_failure and Invalid_argument indicate a programming error (except in the odd duck bool_of_string) from which you are not supposed to recover, except perhaps at the toplevel of your program in order to log the backtrace for persistent processes.
Now contrast this with Failure which is used e.g. in the stdlib to denote parsing errors or that you may raise with Result.error_to_failure locally for error control flow.
It would, but as mentioned above the behaviour of these handlers would be to catch any exception (except the asynchronous ones) and print the backtrace. The chances of success of continuing after such a condition depends on the actual exception and the libraries your are using.
Ah right, it is indeed mentioned for Invalid_argument, but not for Assert_failure (where it may be sort-of obvious, one could argue, but not sure, see below).
I would agree that under most conditions, continuing work after an assertion failure is a bad idea. But on the other hand, wouldn’t logging an assertion failure in the same program/process that just had the assertion failure be (in theory) exactly that sort of bad idea?
I try to compare with other languages, e.g. Rust where you can catch panics which can leave the program in a bad state.
What feels difficult to phrase is: “Don’t catch these exceptions (but it can still make sense to catch them, e.g. to log them).” So when exactly should I catch them? Maybe I should understand it as follows: Some exceptions really should not happen and if I decide to catch them, it should (usually?) result in soon(ish) program termination. The exceptions falling under this category would be:
Match_failure
Assert_failure
Invalid_argument
Out_of_memory
Stack_overflow
Undefined_recursive_module
Is that the exhaustive list of those kind of exceptions? Perhaps it would be nice to make this distinction (exceptions meant to be caught and exceptions not meant to be caught) clearer in the manual. Currently, I only found a respective notice for Invalid_argument.
This is reflected also in the manual, and I generally understand. But one thing confuses me in the standard library. Why is:
# List.hd [];;
Exception: Failure "hd".
While:
# Array.get [||] 0;;
Exception: Invalid_argument "index out of bounds".
# Option.get None;;
Exception: Invalid_argument "option is None".
I would argue that checking that a list has one element is exactly that sort of “easily-checkable precondition” of the function. Yet we get a Failure. Specifically compare with arrays and options where we get an Invalid_argument error.
Perhaps this is because we should think of “lists” as “user input”, while array lengths are usually not depending on a user’s input? But still, this distinction is a bit difficult to grasp.
So using Failure for control flow is idiomatic, even if I am not supposed to match on the kind of failure (see manual)? So something like the following is alright?
let my_is_empty x =
try ignore (List.hd x); false with
Failure _ -> true
To me, it feels a bit risky to match on Failure _ when the involved routines are more complex. I think while I understand the semantics of Invalid_argument, Assert_failure, etc., I don’t quite understand Failure.
Does this perhaps mean that in good coding style Failure’s must be caught?
Yes, that makes sense to me, and it is a good reason against [@warning "-partial-match"] (unless when you want to be lazy and really don’t care, e.g. during prototyping, I guess).
Getting back to my original question, I feel like the answer to my question is:
f2 (i.e. using assert false) is the idiomatic way to go when I am sure that the respective branch won’t be reached.
Why is it the way to go? Because:
Disabling the warning generally doesn’t allow me to specify which cases cannot happen (as explained above).
Failure is an exception meant to be caught (and may swallow the backtrace, as explained above here), thus running into an (supposedly) unreachable branch shouldn’t raise a Failure.
I would only disable warnings when I’m lazy (e.g. for quick prototyping), and I would use Failure when the condition actually can occur, e.g. depending on wrong user input.
Don’t try to read too much in the stdlib, it’s too old to be fully consistent. But a cue between the difference on array accessors and List.hd can be given by the -unsafe compiler option.
Nowadays List.hd could be List.hd_opt, (cue List.nth vs List.nth_opt). But the same treatment would not happen on Array.get.
A good rule of thumb is to use invalid_arg for erroring on invariants whose duty is for the client to maintain before calling the function. Notably for get_* accessors.
There was a time where result did not exist and Failure msg served the purpose of Error msg – and for some people still does.
Personally I most of the time don’t let exception cross API boundaries, however I still find Failure useful, for example to easily make internal functions that eventually return a result tail recursive by avoiding a bind on each recursive call.
let f x =
let loop … = … in
try Ok (loop …) with Failure msg -> Error msg
Yes Failure _ must be caught if you can trigger it. It’s the equivalent of the Error _ case in result values.
I would argue that it’s good practice in an application to always install a top level exception handler, even if all it does is just print the back trace and a link to the bug tracker