You should make sure you at least have basic understanding of this material. Otherwise you are just going to bang your head on the keyboard :–)
So I just read the menhir manual about positions and according to what I understand it defers all definitions to the lexer.
That leaves us with the documentation of sedlex. And one question, why don’t you simply extract the unicode code points from the lexeme and re-encode them via Buffer.add_utf_8_uchar ?