the OCaml language itself has been relatively easy to understand (so far!).
I’d like to throw down a gauntlet.
Here is four lines of working F# code that I can write off the top of my head that represent a regular expression that makes it easy to implement Antimirov derivatives:
type RegEx<'a when 'a: comparison> =
| Lit of 'a
| Alt of Set<RegEx<'a>>
| Seq of RegEx<'a> list
That code means “I have a type RegEx
that is polymorphic and parameterised over a type variable 'a
with the constraint that 'a
implements equality and comparison. A RegEx
is either a literal 'a
or one of a set of alternative RegEx
s or a sequence of RegEx
s”.
Compiled and ran first time. Succinct. Comprehensible. Nice.
Some notes:
- Being polymorphic over the element type means the same code can be used to match sequences of any type, not just strings of chars or arrays of bytes.
- Using sets to represent alternatives and lists to represent sequences means the regular expression that doesn’t match any input is
bot = Alt Set.empty
and the one matching with no input iseps = Seq []
. This is simpler than having more cases:Bot | Eps | Alt of re * re | Seq of re * re
. - The use of a sorted set of regular expressions eliminates duplicates and improves asymptotic performance which is more efficient than using explicit cases.
- You can print any value of any type including this
RegEx
usingprintfn "%A"
.
I’d love to see a simple port of this to OCaml. I just spent an hour trying and my use of mutually-recursive higher-order modules with PPX deriving show and ord resulted in pages of code that still doesn’t work.