Numerical ranges and pattern matching


#1

What always surprised me, that OCaml allows to put range of characters like ‘0’…‘9’ but not a range of numbers.
Why is that and maybe it is a time to add this feature in the compiler? Seems very inconvenient and inconsistent.


#2

This is a natural idea, but so far nobody has produced a patch to implement the feature, because it is harder to implement than one may expect. If I remember correctly, character ranges '0'..'4' are exploded into or-patterns '0'|'1'|'2'|'3'|'4', and manipulated in that non-compact form in the pattern-matching compiler (which later optimizes continuous ranges to compact/efficient code, so it doesn’t show in the compiled output). If you tried to extend this to integer ranges, you would get compile-time blowups for any large range. Instead you have to change the structure of the pattern-matching compiler to support a first-class notion of range, but that is an ambitious refactoring to a part of the compiler codebase that is unpleasant to work with. Nobody has done it for now.


#3

So it is more a matter of it requiring difficult work in the compiler than that there’s a reason in principle that it would be a bad idea?


#4

Though numbers could be desugared into | x when x > = 0 && x <= 9, no?


#5

Presumably that’s one method, though there’s a wide variety of tricks in the compiler literature for handling large switch or case style statements in languages with such constructs. Depending on the details of the particular multiway conditional, any one of several implementation techniques might be optimal. Sufficiently careful compilers figure out which one to use based on the specific instance.


#6

Remember that the compiler performs an exhaustivity check of pattern matchings. Introducing integer ranges as patterns with when clauses isn’t satisfactory because they render the detection of missing cases non decidable.


#7

Though numbers could be desugared into | x when x > = 0 && x <= 9 , no?

This desugaring is not correct because when can only exist at the top-level of the whole clause, whereas interval patterns can be used in-depth. For example, consider the or-pattern

(Some (0..9, true) | Some (100..109, false))