Splitting OCamllex specification into multiple files

When using Menhir it is possible to split the grammar specification into multiple files, as described
here.

Is it possible to make the same thing with Ocamllex definitions ?

I’m currently working on a small DSL with is parameterized on the kind of embedded (arithmetic) expressions. This parameterization is nicely supported by the Menhir “split” facility (one mly file for the DSL grammar, and another for the expression grammar).
But, currently, i still have to put all the lexer definitions in the same .mll file, which obviously breaks down modularity.

A solution could be to used stacked parsers/lexers, as described here, for example
in this post, but in my case, there’s no specific delimiters for “enclosed” expressions and therefore switching btw lexers and parsers is problematic.

Another solution could be the textually merge .mll specs before submitting the result to ocamllex but this is really ugly :frowning:

Any idea ?

Maybe use cppo ? That would allow you to split into multiple files ?

Thanks for the suggestion, Chet.

But seems that ocamllex does not accept cppo directives in the rule section :frowning:

https://github.com/ocaml-community/cppo/blob/master/examples/lexer.mll

Anyway, this is really a workaround. Having to put all lexer definitions in the same file - either by hand or using some pre-processing tool - significantly limits the modularity provided by Menhir, doesn’t it ?

You could use sedlex instead of ocamllex ?

I don’t see any support for multi-file input in sedlex. Am i missing sth ?

Oh, I didn’t mean that ocamllex would accept them, but rather, that you could just use cppo to perform the inclusion. I tried it just now on the OCaml compiler’s lexer:

--- lexer.mll   2023-04-14 10:40:08.087837591 -0700
+++ lexer.cppo.mll      2023-04-14 10:38:47.353235091 -0700
@@ -381,8 +381,7 @@
   | newline
       { update_loc lexbuf None 1 false 0;
         EOL }
-  | blank +
-      { token lexbuf }
+#include "rule1.inc"
   | "_"
       { UNDERSCORE }
   | "~"

and after that, I can run cppo and ocamllex just fine:

$ cppo lexer.cppo.mll  > lexer.mll
$ ocamllex lexer.mll
253 states, 7726 transitions, table size 32422 bytes
6821 additional bytes used for bindings

As to the question of why ocamllex doesn’t provide some form of modularity or inclusion, what could it do that would be better than cppo?

FWIW, I’ve been using cppo more and more often to solve little problems like this. With a couple of lines of Makefile rule, it’s quick work.

ETA: of course, the file “rule1.inc” is

$ cat rule1.inc
  | blank +
      { token lexbuf }

Ok, thanks for the idea and example !
I didn’t know cppo and it seems to open many opportunities.

Probably little. But it will make integration with Menhir simpler. It could also allow some semantic checking (e.g. that the same rule is not defined twice) that a purely textual inclusion mechanism cannot handle (just as th eMenhir merging mechanism can detect token redefinition).

BTW, for dealing with version-compatibility issues, this is just slicker than snot:

$ cppo -V OCAML:`ocamlc -version`
#if OCAML_VERSION >= (4, 0, 0)
(* All is well. *)
#else
  #error "This version of OCaml is not supported."
#endif

The cppo solution works quite well when invoked from the shell :slight_smile:

But i can’t make it work with dune. If i add this to the dune file:

(rule
 (target lexer.mll)
 (deps   lexer.cppo.mll)
 (action (run cppo %{deps} -o %{target})))

i get:

Error: File "lexer.cppo.mll", line 39, characters 0-23
Error: Cannot find included file "expr_kw.inc"

I guess it’s because the included file expr_kw.inc is only present in the source tree (src/bin/) and not in the build repository (_build/default/src/bin/). But i cannot have dune automatically copy it :frowning:

I only use Makefiles, so don’t know the answer, but I remember that somebody else had a similar problem (getting some file to be copied-over to the build directory) and there was a way to specify it in dune. Wish I could remember what it was, or what the title of the post was. Maybe someone here will remember …

Wise option :wink:

Surely, but the documentation section on user actions is specially obscure :confused:

Anyway, it works with a shell script and the file lexer.mll doesn’t have to be regenerated often…

Finally found the solution, after reading this post.

Here’s the rule to add to the dune file

(rule
 (target lexer.mll)
 (deps
   (:src lexer.cppo.mll)
   (glob_files ./inc/*.inc))
 (action (run cppo -I ./inc %{src} -o %{target})))

where inc is the subdir in the source directory where files to be included by cppo are put.