Reuse a Lexing.lexbuf

Hi,

I’m working on a tool for ATD, I want to read the ‘version’ field of a given JSON document and depending on the version use the right read_ parser from ATD. To do this, I have parameters of type Yojson.lexer_state and Lexing.lexbuf as input. If use these lexer_state and lexbuf just to read the version of the JSON file or to parse it, it works, but if I do both, it fails, I guess because they get to an invalid state.

For Yojson.lexer_state I can just create a fresh one with Yojson.init_lexer but what about Lexing.lexbuf? Is there a way to reuse it on the same file/channel? Should I duplicate it? How?

Thanks for your help.

The easiest way is to simply to recreate the lexbuf. If it is based on a channel, you will need to either open a new channel for the new lexbuf or rewind the existing one.

Cheers

1 Like

Yeah, that’s what I thought, but I would like to keep the same signature as ATD read_ functions, so given a Yojson.lexer_state and a Lexing.lexbuf how do I recreate a new lexbuf? If it’s from a file, I guess I can get it from fname in Yojson.lexer_state but otherwise, how can I know from which channel it was open?
How can I rewind an existing lexbuf?

If you’re willing to do a little surgery, I’ll bet you can

(1) force-read enough into the buffer, to ensure that the first round of reading will not incur a file-read: say, a couple of kilobytes grin
(2) deep-copy enough of the lexbuf and state that you can reuse the copy.

That is to say, do the cloning before using either clone.

The generated interface for type foo is

val read_foo :
  Yojson.Safe.lexer_state -> Lexing.lexbuf -> foo

This is rather low-level. Both lexer_state and lexbuf are mutable, and there are no documented guarantees on what’s valid besides starting from a fresh lexer_state and a fresh lexbuf.

Like @nojb suggested, I think I would keep it simple and do a full re-read of the input file or string, and let the library initialize lexer_state and lexbuf. The file should be cached by the OS, so it’s probably not terribly expensive to re-open it. If the input is from stdin, then read it into a string and use that string as input.

Reading a foo from a string is done with foo_of_string data, and reading from a file is done with Atdgen_runtime.Util.Json.from_file read_foo path.

OK, so I’d rather recreate the read functions from Atdgen_runtime.Util.Json with functions that re-read the channel than trying at all cost to recreate a Lexing.lexbug?

Thanks a lot for your help :slight_smile: