In the documentation OCaml library : Str, one has
val matched_string : string -> string
matched_string s returns the substring of s that was matched by the last call
to ... provided ...
This looks like a “fragile” function since it depends on previous calls to other functions, and its behavior is unspecified: there is no way to know “the substring of s that was matched” without looking at the implementation, it seems ?
One has string_match : regexp -> string -> int -> bool
, so for instance, string_match (regexp "a*") "aa" 0
is true
, but what was the substring matched ? It could be ""
or "a"
or "aa"
, and it is unspecified which will be matched_string "aa"
.
Unfortunately, the code ocaml/str.ml at trunk · ocaml/ocaml · GitHub has a lot of external
and Domain.DLS
which I do not know where to look for and I cannot understand it.
I would expect to have a function like
longest_match : regexp -> string -> ?(pos = 0) -> string option
Is it possible to have such a function, which is “robust” in that it does not depend on previous function calls ?
It looks like on my computer, string_match
actually does a “longest match”, but maybe this depends on my architecture or something else ? Provided that it is always a longest match (is it?), it looks like
let longest_match r s ?(pos = 0) =
if string_match r s pos then Some (matched_string s) else None
would work, because it ensures that matched_string
is called immediately after string_match
and with the same s
? (provided also that if String.length s < pos
, then string_match r s pos = false
)
Or maybe my wish to have longest_match
instead of matched_string
reflects bad coding style ?
Bonus question: does ocamllex use Str or does it do its own regexp work ?