Can we simplify collection of Menhir $startpos $endpos?

oxcrow · November 10, 2025, 12:36pm

Hi everyone,

This is mostly a question to satisfy my curiosity, and not to fix any bugs.

I’m writing a compiler in OCaml and Menhir parser and am pleasantly happy with OCaml.

This is my parser file and I’m referring to an issue in this file: [nxn / lib / nxnParser.mly]

Using Menhir’s $startpos and $endpos values I was able to tag my AST nodes with location information, then use them to report errors in my code.

To help with this, I created these helper functions and they work well,

 let location (x: Lexing.position) =
    let lnum = x.pos_lnum in
    let cnum = x.pos_cnum - x.pos_bol in
    let loc = Ast.Loc{lnum=lnum; cnum=cnum} in
    loc
  ;;

  let position (startpos: Lexing.position) (endpos: Lexing.position) =
    let start = location startpos in
    let end' = location endpos in
    let pos = Ast.Pos{start=start; end'=end'} in
    pos
  ;;

My issue is, to use this position function, I need to explicitly pass $startpos and $endpos everywhere in my parser grammar, while calling the position function.

id:
  | i=IDVAL; { Ast.Id {value=i; pos=(position $startpos $endpos)} }

I don’t mind doing this, since it’s stable, and works well for my task.

But it would be nice if there was some way to simplify the pos=(position $startpos $endpos) into something simple like pos=p where p would contain the value returned from the position function.

My intention is to reduce the amount of boiler plate code required to write my parser.

Thanks

nojb · November 10, 2025, 1:13pm

A typical approach is to change the type definition from

type expr =
  | Id of {value: string; pos: position}
  | ... of { ... ; pos: position }
  | ... of { ... ; pos: position }

into:

type expr = { desc: expr_desc; pos: position }
and expr_desc =
  | Id of {value: string}
  | ...
  | ...

One can then mirror this factoring in the parser with:

expr: expr_desc { {desc = $1; pos = position $startpos $endpos} }
;;
expr_desc:
| i=IDVAL { Ast.Id {value = i} }
| ...
;;

Makes sense?

Cheers,
Nicolas

JohnJ · November 10, 2025, 2:15pm

Another way that I would tackle this problem is by simply using Menhir’s $loc (a shorthand for ($startpos, $endpos)), and defining your AST similar to:

type position = Lexing.position * Lexing.position

type expr =
  | Id of {value: string; pos: position}
  | ... of { ... ; pos: position }
  | ... of { ... ; pos: position }

id:
  | i=IDVAL; { Ast.Id {value=i; pos=$loc} }

If you only need the position information for reporting errors, then you can save calling your position function for when you construct an error message. Another reason to save calling position for when you construct an error message is that you’ll need to do this anyway for errors raised by the parser itself.

More generally, beyond locations and error messages, I think it’s easiest when your parser mostly just constructs data with minimal extra logic, and any enhancements to it are done by the compiler’s later steps.

oxcrow · November 11, 2025, 12:45pm

I really like your idea!

Thanks

 let location (x: Lexing.position) =
    let lnum = x.pos_lnum in
    let cnum = x.pos_cnum - x.pos_bol in
    let loc = Ast.Loc{lnum=lnum; cnum=cnum} in
    loc
  ;;

  let position (startpos: Lexing.position) (endpos: Lexing.position) =
    let start = location startpos in
    let end' = location endpos in
    let pos = Ast.Pos{start=start; end'=end'} in
    pos
  ;;

  let pos (loc: Lexing.position * Lexing.position) =
    let startpos, endpos = loc in
    position startpos endpos
  ;;

id:
    | i=IDVAL; { Ast.Id {value=i; pos=(pos $loc)} }

Topic		Replies	Views
Menhir and preserving comments from source Ecosystem menhir	6	2543	April 20, 2019
Menhir:How to use inline to add locations? Learning menhir	2	1099	December 22, 2022
Your favorite Menhir tricks and fanciness Ecosystem menhir	15	5580	December 3, 2021
Ppx_parser OCaml5: how to track token position? Learning ppx	6	293	March 30, 2025
Implementing an include statement in menhir > 20211123 Learning menhir	6	1182	March 10, 2022

Can we simplify collection of Menhir $startpos $endpos?

Related topics