Please, advise how to track positions of tokens using ppx_parser OCaml5 library?
For example, having a parser
let rec parse_number buf = function%parser
| [ '0' .. '9' as c; [%stream s] ] -> parse_number (store buf c) s
| [] ->
(* how to get start and end position of parsed token here? *)
Buffer.contents buf
Thanks to @Chet_Murthy. According to Stream parsers — Camlp5 documentation it might be done as follows:
let rec parse_number buf =
function%parser
| [ '0' .. '9' as c; [%stream s] ] ->
parse_number (store buf c) s
| [[%stream s]] ->
(* how to get start and end position of parsed token here? *)
let ep = Stream.count s in
Printf.printf "parse_number: %d\n" ep;
Buffer.contents buf
Still, I don’t understand how to properly stop parsing from an infinite loop here?
Also, how to get start position of a parsed token here?
I addition, I’m porting Heritage_Platform to OCaml5. Particularly, min_lexer and transduction. It uses Camlp4, so I decided to switch to ppx_parser as closest update. But if there is any other mature and LTS library for parsing and close to ppx_parsers, I might port to it.
Re: how to get start pos of a parsed token, you would do that -before- parsing. That is, something like (I’m not going to test this code, just write down a sketch):
let parse_something strm =
let spos = Stream.count strm in
(function%parser blablabl ... blabalba ... ) strm
So you see, you have the position in the stream -before- parsing.
Re: “properly stop parsing from an infinite loop” I don’t understand what you mean. In Camlp5 stream-parsers, the way to do that is with the stream-pattern [< >]
, but this is equivalent to [< strm >]
, which is equivalent to [< _ >]
So that might be the same as (with ppx_parser) [ [%stream s] ]
or [ [%stream _] ]
or [ ]
1 Like
Let me explain where it hangs:
module Lexer = struct
let rec next_token_loc init_pos =
function%parser
| [ (' ' | '\n' | '\r' | '\t' | '\026' | '\012'); [%stream s] ] ->
let end_pos = Stream.count s in
next_token_loc end_pos s
| [ [%let tok = next_token]; [%stream s] ] ->
let end_pos = Stream.count s in
let loc_new = Loc.{ start_pos = init_pos ; end_pos = end_pos } in
Printf.printf ">> init_pos %d | next_token_loc: %d %d\n" init_pos loc_new.Loc.start_pos loc_new.Loc.end_pos;
Some (tok, loc_new)
| [ [%stream s] ] ->
Printf.printf ">> EOL %d \n" init_pos;
let pos = Stream.count s in
Some (EOI, Loc.{ start_pos = init_pos; end_pos = pos + 1 })
end
let mk () = fun stream -> Stream.from (fun _ -> next_token_loc 0 stream)
let () =
let input = Stream.of_string " 12333 456" in
let lexer = Lexer.mk () in
Stream.iter
(fun (tok, loc) ->
Printf.printf "Token: %s, Location: [%d, %d]\n" (Token.to_string tok)
loc.Loc.start_pos loc.Loc.end_pos)
(lexer input)
So, let mk()
produces infinite Stream
of Some (EOI, Loc(0,12))
. I don’t understand how to make it finite?
I’m not sure what you’re trying to do, but if what you mean is to ask how to stop consuming an infinite stream, then the answer is just to stop. Here is an example:
value list_of_stream_eof eoftok strm =
let rec lrec acc = parser [
[: `e when e = eoftok :] -> List.rev [ e::acc ]
| [: `e ; strm :] -> lrec [ e::acc ] strm
| [: :] -> List.rev acc
] in
lrec [] strm
;