Declare variable and evaluate expression within pattern

I’ve been learning OCaml for a compiler course, and while parsing s-expression I encountered the following scenario

...
match (sexp_list : sexp list) with
...
| Symbol(id)::rest when (id == keyword_add) -> (parse_binary Add ...)
| Symbol(id)::rest when (id == keyword_sub) -> (parse_binary Sub ...)
| Symbol(id)::rest when (id == keyword_mul) -> (parse_binary Mul ...)
...

Basically, I need to look into the first item of the list to decide what expression to parse it into.
Obviously there is a lot of duplicated code above, so here is what I attempted next.

match (sexp_list : sexp list) with
...
| Symbol(id)::rest when (is_binary_keyword id) -> 
 (parse_binary (binary_op id) ...)
...

Basically I checked whether the id belongs to the set of binary keyword, then abstract out the different constructor logic into binary_op function. But this function could throw an error, we have a soft guarantee from the when guard here, that whatever we passed to binary_op would be a parsable id, and we are doing extra work.

So this is what I would really like to write, but it’s not valid OCaml syntax.

match (sexp_list : sexp list) with
...
| Symbol(id)::rest with (binop = (binary_op id)) 
                   when (is_valid binop) -> (parse_binary binop ...)
...

I had quite limited experiences with pattern matching, and I still think of it as very convenient if checking on runtime type tags. In this situation this is my mental model of what I’m trying to accomplish

...
if sexp_list has form Symbol(id)::rest
    let binop := (binary_op id)
    if (is_valid binop) return (parse_binary binop ...)
    else doNothing, fall through to next pattern match

if ...

I’m curious how would people usually handle such situation in OCaml, i.e. when we want to use computation result from the when guard (or other things) in pattern match.

My advice is to stay away from when clauses, unless you have a very good reason for it. In your case I would simply use a nested pattern matching:

match sexp_list with
| Symbol id :: rest ->
    begin match classify id with
    | Binary_op binop -> parse_binary binop rest
    | ... -> ...
    (* handle other cases with a leading symbol *)
    end
| ... -> ...
    (* handle other cases *)
1 Like

Thank you for your advice, that seems like the way to do it indeed. I’ll have to make up a new variant type for the return value of classify id, but it’s not too much work at all.

I would also like to hear other people’s thoughts on this if possible:).

If your id has type string (or any other non-abstract data type), then you can match on it, e.g.,

match exps with
| Symbol "add"  :: rest -> parse_binary Add ...
| ...

if you think that hoisting binary and unary rules will benefit for you parser you can split them like this

let rec parse_exps = function
  | Symbol id :: rest -> parse_head id rest 
  | Otherform id :: rest -> parse_other_form id rest
and parse_id head rest = match categorize head with
  | Binary op -> parse_binary op rest
  | Unary op -> parse_unar op rest
...

With that said, given that you parse a sexp-list, which is roughly a Lisp syntax, it is already suspicious that you are going to split your forms into binary and unary. Commonly, lisps prefer all forms to be n-ary.

1 Like

Related thread Musings on extended pattern-matching syntaxes

1 Like

Thanks again for such prompt feedback!

The reason I didn’t match on string literal is to avoid hard-coding them into my parser.
Also, the reason I’m making unary/binary expression is that although I’m parsing S-Expression here, my target language requires these primitive arithmetic operations to be unary/binary.

The related thread is exactly what I was looking for. I guess the question isn’t restricted to my proposed scenario, but rather the fact that sometimes we want to do more in pattern matching. (specifically, declaring variable to hold computation result during pattern matching, and use it within pattern matching or output expression)

The proposed solutions here both requires introduction of new types of hold that intermediate computation result. Again this doesn’t seem too difficult, but I feel like it would be nice if OCaml could support the with syntax in pattern match.

Just to be clear, adding computations to patterns is called active patterns. While it is possible to implement them in the language, their compilation will be very different. Right now, patterns are compiled into efficient decision trees, leveraging the fact that the pattern language is fully static. Introducing arbitrary computations into patterns will prevent this, at least for those patterns that use this feature.

Concerning an intermediate data type, in general, it is not required, as any data type could be substituted with a continuation-passing style.

What I’m trying to articulate here, is that active patterns in recursive-descent parsers are not necessary and counter-productive. Not only they impede performance, but they also make it easier to introduce the left recursion into your parser as well as other subtle bugs.

It is not hardcoding. The parser is a specification, so using direct literals is fine there. There is no benefit of pushing the hardcoding any further from the actual parser (you will still hardcode them at someplace, e.g., in the function let is_binary = "+" | "-" ... -> true | _ -> false). We rely on S-expressions a lot in BAP, for example, we have a Common Lisp style language with advanced macro system (which also exemplifies how important it is to have good location information). We use it to write recipes, pass information between programs, or even writing program transformations, as a backing storage for NoSQL databases and even for writing rule engines. All these links contain parsers of various complexities, from which you can get, I hope, some inspiration or learn some tricks :slight_smile:

1 Like

Thank you so much for your explanation:). This is really really helpful. I’m going to spend some time digesting all of this, and I’ll definitely look into BAP!

1 Like