Lexing from string

Hello,

I would like to use the function Lexing.from_string, but to specify the starting line number and character offset. Is there a good way to do this? I have tried artificially mutating the lexbuf with the expected values, but it seems that the lexer considers my string to contain only eof.

maybe post an example code that shows the problem?

Lexer (lex.mll):

rule token = parse

| "test1" { (lexbuf.Lexing.lex_curr_p.Lexing.pos_cnum,"test1") }

| "test2" { (lexbuf.Lexing.lex_curr_p.Lexing.pos_cnum,"test2") }

| " "     { (lexbuf.Lexing.lex_curr_p.Lexing.pos_cnum," ") }

| eof     { (lexbuf.Lexing.lex_curr_p.Lexing.pos_cnum,"eof") }

Usage attempts:

let _ =
Printf.printf "case 1\n";
    let string = "test1 test2" in
    let lexbuf = Lexing.from_string string in

      let tok = Lex.token lexbuf in
      Printf.printf "%d %s\n" (fst tok) (snd tok);


Printf.printf "case 2\n";
    let string = "test1 test2" in
    let lexbuf = Lexing.from_string string in

      lexbuf.Lexing.lex_start_pos <- 500;
      lexbuf.Lexing.lex_curr_pos <- 500;
      lexbuf.Lexing.lex_last_pos <- 500;
      lexbuf.Lexing.lex_eof_reached <- false;
      let pos = { Lexing.pos_fname = "file";
                  Lexing.pos_lnum = 500;
                  Lexing.pos_bol = 500;
                  Lexing.pos_cnum = 500; } in
      lexbuf.Lexing.lex_start_p <- pos;
      lexbuf.Lexing.lex_curr_p <- pos;

      let tok = Lex.token lexbuf in
      Printf.printf "%d %s\n" (fst tok) (snd tok)

In case 1 I leave the lexbuf as is. I get the right token (test1), but the resulting character offset is from based on a start of 0.

In case 2 I try to update the lexbuf to the position that I want. Now I get a character offset that is based on what I specified, but the token is eof.

How can I artifically specify a position, but get Lex.token to return the actual tokens?

thanks,
julia

Hi Julia,

I think you need to modify the lex_abs_pos field. In fact, 4.11 will include a function that will do this for you, see https://github.com/ocaml/ocaml/pull/8771 and in particular, the implementation of the Lexing.set_position function: https://github.com/ocaml/ocaml/pull/8771/files#diff-90edd3d767ad442e41af6e9e21d9302d

Cheers,
Nicolás

That seems to work. Thanks!