Substring of Unicode string including newlines in Windows

dbuenzli · September 26, 2021, 9:27pm

You should make sure you at least have basic understanding of this material. Otherwise you are just going to bang your head on the keyboard :–)

So I just read the menhir manual about positions and according to what I understand it defers all definitions to the lexer.

That leaves us with the documentation of sedlex. And one question, why don’t you simply extract the unicode code points from the lexeme and re-encode them via Buffer.add_utf_8_uchar ?

Topic		Replies	Views
How to access the module Uutf.String.UTF_8 Learning	23	4675	March 28, 2018
Literals for Uchar.t (Unicode code points, more precisely Unicode scalar values)? Community	31	1716	October 28, 2023
OCaml standard library Unicode support Community	21	1415	July 6, 2024
Unicode-aware parser combinators Learning parsing	25	1312	September 13, 2023
Re2ocaml regexp compiler Ecosystem regexp , lexer	12	744	February 18, 2025