How does lexbuf position work?

Hi there,

I am writing a L1 compiler and trying to provide an accurate error message with position info during lexing. However, I cannot figure out how the lexbuf position works.

My lexer file looks like

rule token = parse
  | ws+  { initial lexbuf }
  | '\n' { Lexing.new_line lexbuf;
           token lexbuf
         }
  | '+' {assert false}
  | '=' {assert false}

And my parser file looks like

program :
  | Int;
    Main;
    L_paren R_paren;
    L_brace;
    body = stms;
    R_brace;
    Eof;
      { body }
  ;

So once I read a new line, Lexing.new_line will maintain the pos_lnum in lexbuf due to documentation.

My parser function is

let parse (filename : string) : program =
    let ast =
      In_channel.with_file filename ~f:(fun chan ->
          let lexbuf = Lexing.from_channel chan in
          init_lexbuf filename lexbuf;
          try my_parser.program my_lexer.token lexbuf with
          | _ ->
            (* Parse error; attempt to print a helpful error message. *)
            let src_span =
              of_positions Lexing.(lexbuf.lex_start_p) Lexing.(lexbuf.lex_curr_p)
            in
            Error_msg.error my_lexer.errors (Some src_span) ~msg:"Parse error.";
            raise Error_msg.Error)

I initialized my lexbuf as below

let init_lexbuf (filename : string) : Lexing.lexbuf -> unit =
  let open Lexing in
  let pos = { pos_fname = filename; pos_lnum = 1; pos_bol = 0; pos_cnum = 0 } in
  fun lexbuf ->
    lexbuf.lex_start_p <- pos;
    lexbuf.lex_curr_p <- pos
;;

Now I am trying to compile a l1 file

int main()
{
    int a;
    a = a + 1;
    return 0;
}

I got an error, which is within expectation because I assert false when read an equal sign. However, the location for the error seems inaccurate because the line number should be 4, but I got 1. Also the column number is misleading as well.

The error message is shown below

test.l1:1.31-1.32:error:Parse error.

FYI my error message function is

let print_msg  (span : Mark.src_span option) ~(msg : string) =
  Option.iter span ~f:(fun x -> Out_channel.output_string stderr (Mark.show x));
  Out_channel.fprintf stderr ":%s:%s\n" msg
;;

Thank you in advance, any help would be appreciated!

I have figured out, it’s due to I add ws incorrectly

let ws = [' ' '\t' '\n' '\r' '\011' '\012'] 

so once parser meet ‘\n’, it will use action {token lexbuf} instead of

| '\n' { Lexing.new_line lexbuf;
           token lexbuf
        }