I’m writing this note to ask for suggestions from readers. First, some setup: we all know that LR grammars are … famously fickle when it comes to adding rules post-hoc: you really do need to understand the entire grammar in order to do it safely. It’s widely-accepted (and I certainly believe it) that LL(1) grammars are much more amenable to post-hoc extension – the acceptability of a new production can typically be decided and explained based only on the other productions of that nonterminal, and not the entire grammar (ok, it’s not completely true, but pretty much). And so, in Camlp4/5, there’s an LL(1) grammar-interpreter, that can be dynamically extended at runtime. The grammar-rules are compiled separately, where they are defined, and are combined by the interpreter when modules containing rules are dynamically (or statically) linked.
So … I’m writing an LL(k) parser-generator, based on reading the ANTLR papers, and it will be intrinsically compiled (no interpreter). Partially, that’s because ANTLR supports parsing rules that take arguments (so you can express “accumulating parameters” in the grammar directly) and partially that’s because to do otherwise would require a ton of Obj.magic
, and I’d like to avoid that. This means that things won’t be dynamically extensible at runtime, but I would like to preserve the ability to write a grammar in parts, combining them only when all the modules with grammar-rules have been installed.
So here’s my question: does anybody have any suggestions for how one might put bits of grammar-rules in many different modules, and then combine them later to allow for grammar-compilation ? I’m literally asking for suggestions on how packaging might be effected.
Sorry if this is far too open-ended: I’ve been focusing on getting the thing working, and not so much on how I might preserve extensibility.