In the documentation OCaml library : Str, one has
val matched_string : string -> string
matched_string s returns the substring of s that was matched by the last call
to ... provided ...
This looks like a “fragile” function since it depends on previous calls to other functions, and its behavior is unspecified: there is no way to know “the substring of s that was matched” without looking at the implementation, it seems ?
One has string_match : regexp -> string -> int -> bool, so for instance, string_match (regexp "a*") "aa" 0 is true, but what was the substring matched ? It could be "" or "a" or "aa", and it is unspecified which will be matched_string "aa".
Unfortunately, the code ocaml/str.ml at trunk · ocaml/ocaml · GitHub has a lot of external and Domain.DLS which I do not know where to look for and I cannot understand it.
I would expect to have a function like
longest_match : regexp -> string -> ?(pos = 0) -> string option
Is it possible to have such a function, which is “robust” in that it does not depend on previous function calls ?
It looks like on my computer, string_match actually does a “longest match”, but maybe this depends on my architecture or something else ? Provided that it is always a longest match (is it?), it looks like
let longest_match r s ?(pos = 0) =
if string_match r s pos then Some (matched_string s) else None
would work, because it ensures that matched_string is called immediately after string_match and with the same s? (provided also that if String.length s < pos, then string_match r s pos = false)
Or maybe my wish to have longest_match instead of matched_string reflects bad coding style ?
Bonus question: does ocamllex use Str or does it do its own regexp work ?