I’ve been playing with my parser and trying to formalize its result better by using types which are not polymorphic variants… though it seems that as hard as I try, it’s impossible (or very ugly) to do it without going through polymorphic variants at one stage of the parsing process or another.
Also, I don’t see to be gaining any performance by using closed variants, which shouldn’t be the case.
For example, let’s see this parser which parses a number:
A version with polymorphic variants:
open Angstrom
let is_digit = function
| '0' .. '9' -> true
| _ -> false
let integer =
take_while1 is_digit
>>= fun s ->
return (`NumericPart s)
let number =
list [
(peek_char >>= function
| Some '-' -> advance 1 >>= fun _ -> return (`Negative)
| Some '+' -> advance 1 >>= fun _ -> return (`Positive)
| _ -> return (`Positive));
integer <?> "Number expected";
peek_char >>= function
| None -> return (`NumericPart "0")
| Some '.' -> (char '.' *> integer)
| Some _ -> return (`NumericPart "0")
]
>>= function
| [sign; `NumericPart whole; `NumericPart part] ->
let signed_whole = match sign with
| `Negative -> "-" ^ whole
| `Positive -> whole
| _ -> assert false
in
let len_part = (String.length part) in
let denom = (Int.pow 10 len_part) in
return (`Fraction (int_of_string (signed_whole ^ part) , denom))
| _ -> fail "Number parser shouldn't happen"
And here is with closed variants:
open Angstrom
type sign =
| Positive
| Negative
type num_parts =
| Sign of sign
| NumericPart of string
let is_digit = function
| '0' .. '9' -> true
| _ -> false
let integer =
take_while1 is_digit
>>= fun s ->
return (NumericPart s)
let number =
list [
(peek_char >>= function
| Some '-' -> advance 1 >>= fun _ -> return (Sign Negative)
| Some '+' -> advance 1 >>= fun _ -> return (Sign Positive)
| _ -> return (Sign Positive));
integer <?> "Number expected";
peek_char >>= function
| None -> return (NumericPart "0")
| Some '.' -> (char '.' *> integer)
| Some _ -> return (NumericPart "0")
]
>>= function
| [sign; NumericPart whole; NumericPart part] ->
let signed_whole = match sign with
| Sign Negative -> "-" ^ whole
| Sign Positive -> whole
| _ -> assert false
in
let len_part = (String.length part) in
let denom = (Int.pow 10 len_part) in
return (`Fraction (int_of_string (signed_whole ^ part) , denom))
| _ -> fail "Number parser shouldn't happen"
So because I have to put heterogenous things in nested lists (the AST), it becomes very tedious and explicit (hence ugly) if I try to use regular variants. And it also doesn’t seem to be able to buy me a lot of type safety, as I still regularly find myself writing catch-all cases which can’t happen in practice.
Also, to my surprise, by using closed variants, my program doesn’t really work faster. I tested on huge files and there isn’t really any difference in speed.
So I’m not sure what my question is here. But I’d be happy to discuss ways of improving this situation if there are any at all.