String manipulation

NavnathKumbhar · December 13, 2019, 10:19am

Hello,

I have a below string:

let str = "MACROBUTTON AbaisserEnCorpsDeTexte \"[Click here and insert a PICTURE (mandatory)]\""

I want to extract “[Click here and insert a PICTURE (mandatory)]” from this string.
Is it possible to do it with only one or two line of OCaml code?

Thank you in advance.

octachron · December 13, 2019, 10:48am

Certainly. The simplest code that respects your specification:

let str = "[Click here and insert a PICTURE (mandatory)]"

If your specification was, in fact, extract the first string between ":

let extract s = match String.split_on_char '"' s with
  |  _ :: s :: _ -> Some s | _ -> None

Or if you wanted all strings between "

let extract s = snd @@ List.fold_left (fun (p,l) x ->  not p,
  if p then x :: l else l)  (false, [])@@  String.split_on_char '"' s

Note that the unnatural constraint 1 or 2 line requires unnatural code indentation.

SkySkimmer · December 13, 2019, 11:58am

In ocaml 4.11 all strings between " can be gotten with

let get s = String.split_on_char '"' s |> List.filteri (fun i _ -> i mod 2 = 1)

NavnathKumbhar · December 13, 2019, 12:15pm

Thank you for the feedback.

I think my requirement was not clear.
I need to get third field from the string.
For example, my string is: "MACROBUTTON AbaisserEnCorpsDeTexte \"[Click here and insert a PICTURE (mandatory)]\""

and third field from this string is “[Click here and insert a PICTURE (mandatory)]”

I do not want to extract string between “” This third field can be anything. This is just en example.

Another examples of strings are:
"MACROBUTTON AbaisserEnCorpsDeTexte ..."
"MACROBUTTON CheckFail PASS" etc.

In above examples, third fields are “…” and “PASS”

octachron · December 13, 2019, 12:32pm

Then you can just split on ' ', discards the first two elements and concat the rest. Or you can scan with scanf,

let extract s = Scanf.sscanf s "%s %s %s" (fun _ _ z -> z)

or write a parser with angstrom, or … . I am not sure what is the issue here.

Chet_Murthy · December 14, 2019, 3:37am

(1) If your string is really a record of a sort, with space-separated fields, and each field is either no-white-space, or if it has whitespace, is double-quoted, AND if the number of fields is FIXED, I would suggest you use Pcre (or Str) and write a regexp with capture groups (the parentheses). That’ll get you whichever field you wish.

(2) If the above is true, except that the # of fields is unbounded, then you’ll probably need to write something that iterates down the string to the field of interest. You can do that with Pcre again, since you can do a match from a starting-position.

This will require that you carefully specify the syntax of fields. So for instance, if the double-quoted field can itself contain escaped double-quotes, you’ll need to make sure your regex accounts for that.

If you’re working on a lot of problems like this, I think learning how to use regular expressions is going to be really, really valuable. And this will be true regardless of which language you choose: indeed, in perl/python/ruby/etc, you’ll need them even moreso than in Oaml.

Hope this helps.

NavnathKumbhar · January 22, 2020, 11:16am

Hello,
I tried with this string manipulation with PCRE regular expression. But, I end up with incomplete regular expression.

Regular expression I tried to match MACROBUTTON fields is:
[^ ]+

But , as you can quickly guess, it breaks when the third field of MACROBUTTON has space. (e.g. “[Click here and insert a PICTURE (mandatory)] ” )

Could you please suggest some improvements on this regular expression?
Sorry for the very late reply.

Thank you in advance.

Chet_Murthy · January 22, 2020, 6:00pm

the regexp you cite is an attempt to match the -separator- between fields. But that separator appears in fields. I’d suggest you first write a regexp to match a field, and then string them together with the regexp to match the separator. What is a regexp that matches the entire line?

At this point, it might be useful to back up and work thru the lexical analysis chapter in a good compilers textbook, and/or the O’Reilly book on regexps.

ivg · January 22, 2020, 7:08pm

If your fields are separated with a single space character, then the following simple function could be utilized,

let extract s = match String.split_on_char ' ' s with
    | _ :: _ :: rest -> String.concat " " rest
    | _ -> failwith "invalid format"

The idea, is to split the sentence into words, then drop the first two, and build the sentence back from all the words except the first two.

NavnathKumbhar · January 23, 2020, 5:30am

No, fields are separated by one or more space characters.

Topic		Replies	Views
How does one index/split/splice strings in OCaml like we do in Python? Learning	1	913	November 9, 2019
Can any one help, whats wrong with the below syntax? Learning ocamlformat	11	1258	September 10, 2019
[help] Regex pattern and raw strings Learning string , pattern , str , regexp	2	460	March 22, 2023
Edit a string in OCAML Ask Question Learning	2	1460	February 13, 2018
Get all regexp matches from a string Learning regexp	7	1372	August 27, 2022

String manipulation

Related topics