How to use a symbol table with ocamllex+menhir?



Disclaimer: I’m new to the language.

I’m trying to figure out how to implement a sort of symbol-table in menhir. I would like to know whether it’s possible at all (I searched a bit in the manual but couldn’t find anything past parameters which I don’t see useful for this) or maybe it’s not the right design choice and there’s a standard way to handle it.

For illustration purposes, consider the following lexer+parser for a language to set identifiers to integer values and print them:

(* lexer.mll *)
  open Parser
  let get = Lexing.lexeme
rule token = parse
  | "print"    { PRINT }
  | "set"      { SET }
  | ['a'-'z']+ { ID (get lexbuf) }
  | ['0'-'9']+ { INT (int_of_string (get lexbuf)) }
  | eof        { EOF }

(* parser.mly *)
%token <int> INT
%token <string> ID
%start <unit> prog
  | l=list(stmt) EOF { }
  | SET id=ID v=INT { (* TODO: update symbol table with <id,v> *) }
  | PRINT id=ID { (* TODO: check symbol table for value of id } }

How to pass the symbol table information into the actions? Or is the right way to generate a parse tree without considering environment and operate on that afterwards?



I do not know whether it’s possible to update a table on the fly. You could try to insert this code at the beginning of the .mly file :

  exception Parsing_error of string

  let symbol_table = Hashtbl.create n
  (* replace n with the expected number of entries *)

  let update id v = Hashtbl.replace symbol_table id v

  let print id =
    match Hashtbl.find_opt symbol_table id with
    | Some v -> Printf.printf "%d\n" v
    | None -> raise (Parsing_error ("undefined id: " ^ id))

This way, you will be able to build rules looking like this :

  | SET id=ID v=INT { update id v }
  | PRINT id=ID { print id }

As you can see, this parser executes things while parsing. If that works, it’s a quick way to do what you wanted. The common way to do it, though, would rather be building an AST from the statements first, and then interpreting it with your symbol table as a state of the interpreter, so an argument of the main interpreter’s function :

let rec eval symbol_table = function
  | (* all possible cases for the AST, *)
    (* possibly updating the symbol_table present as a parameter *)

I hope I made myself clear enough. Have a nice day !


You don’t implement a symbol table “in Menhir”. You can implement one in the OCaml code you write associated with your actions taken as parsing rules filre, or in code that processes a generated AST after parsing is done. In other words, Menhir doesn’t have anything to do with creating a symbol table on its own, you write the code to create and maintain a symbol table if you want one. (Note that this is an entirely standard design pattern in parsers; I’m unaware of a parser generator that helps with the creation of symbol tables.)


Maybe I didn’t explain myself correctly. I do understand the role of menhir as a parser generator and by no means I imply it should provide the means to implement a symbol table. I’m merely asking if there’s a way to pass some sort of state to the semantic actions without having to create a global state as in @cloudyhug’s solution above with the hash table, or at least a way to create independent instances of the parser in order to isolate this global state into each instance.


Ah. In that case: I think the cleanest way to implement a symbol table using a pure functional approach is to generate an explicit AST and then walk it, rather than attempting to do all the work during the course of parsing.

At one time (the era where people did things like implementing compilers for big languages that fit in 64k of RAM and the like), explicit AST construction was unfashionable. Now, I think it simplifies your work.

Indeed, you can think of the compilation process as one where you start with the source code and implement a series of stepwise transformations on it, each of which can be implemented as a function from the previous representation to the next. Parsing is then a function from textual source code to AST, generating symbol tables and other such analysis is a series of functions from ASTs to modified or enriched ASTs, etc. Every pass through the code is then a function over the last representation yielding the next representation, and the composition of all these functions takes textual source code to a final object code representation.


Notice you may create a functor out of a Menhir parser specification, which gives you an easy way to pass arguments to the parser and to ensure that your global state is isolated into different instances.
Search for the %parameter clause in the documentation.