How to make nested match expressions more pleasent?

I am learning Ocaml and as a simple project I have written a simple http parser.

open Base;;

type t = {
    meth: string;
    url: string;
    headers: (string * string) list;
    body: string;
};;

let get_method s =
    match String.lsplit2 ~on:' ' s with
        | Some (meth,tl) -> Ok (meth, tl)
        | _ -> Error "method cannot be parsed"
;;

let get_url s =
    match String.lsplit2 ~on:'\n' s with
        | Some (meth,tl) -> Ok (meth, tl)
        | _ -> Error "url cannot be parsed"
;;

let single_header header =
    match (String.lsplit2 ~on:':' header) with
    | Some(key, value) -> Ok (key, value)
    | _ -> Error "single header cannot be parsed"
;;

let rec get_headers headers s =
    match String.lsplit2 ~on:'\n' s with
       | Some("", tl) -> Ok (headers, tl)
       | Some(header, tl) -> (match single_header header with
           | Ok hd  -> get_headers (hd :: headers) tl
           | Error msg -> Error msg)
       | None -> (match String.lsplit2 ~on:':' s with
          | Some hd -> Ok (hd :: headers, "")
          | _ -> Error "error")
;;

let get_body s = Ok(s, "");;

let from_string s =
    match get_method s with
    | Ok (meth, s) -> (match get_url s with
        | Ok (url, s) -> (match get_headers [] s with
            | Ok (headers, s) -> (match get_body s with
                | Ok (body, _) -> Ok { meth = meth; url = url; headers = headers; body = body }
                | Error msg -> Error msg)
            | Error msg -> Error msg)
        | Error msg -> Error msg)
    | Error msg -> Error msg

in the last function , i wanted to know if there is any better way to handle nested match expressions? Also I appreciate any comment or review on this code. Thank you in advance.

I think there’s some pattern of using monads, to get what you want. It’s complicated a bit b/c you’re using both the Option and the Result monad, but if you were to wrapper the split2 so that it returns a Result, I suspect you can probably clean things up.

Would you please give me hint on the pattern name or provide an example ? imagine my split function returns results, i can easily wrap it if it helps make the code simpler

You can use let operators to make the code simpler.

For example, you can wrap up the result processing as a let operator

let (let*) = Result.bind

and write

let from_string s =
  let* meth   , s = get_method s in
  let* url    , s = get_url s in
  let* headers, s = get_headers [] s in
  let* body   , _ = get_body s in
  Ok { meth = meth; url = url; headers = headers; body = body }

or wrap up both the result processing and state passing as a let operator

let (let** ) m k s = match m s with Error s -> Error s | Ok (v, s') -> k v s'
let return v = fun s -> Ok (v,s)

and write

let from_string =
  let** meth    = get_method in
  let** url     = get_url in
  let** headers = get_headers [] in
  let** body    = get_body in
  return { meth = meth; url = url; headers = headers; body = body }
3 Likes

Have a look here too:

2 Likes

In your examples, monadic style will work well.

But in general, you can put the happy case as the last one and deindent it. This is similar to the “early return” idiom in imperative languages.

2 Likes

First off I know this is an OCaml site and I do like OCaml but will not be suggesting OCaml as you know it with this comment.

If you spend time parsing real world HTTP and HTML you will soon find that not all tags are open or closed per the standard. For example </b> is often used without a start tag <b>. There are ways to get a parser to parse such but I find that using a non-deterministic parser works better. So don’t think of this as a problem and OCaml as the solution think of this as a problem and what is an effective way to solve the problem.

For non-deterministic coding I prefer Prolog and for parsing with Prolog I prefer Definite Clause Grammars (DCG). I didn’t find an OCaml package for DCG but DCGs are syntactic sugar for difference list for which there are OCaml packages (ref).

To parse HTTP and HTML with difference list using OCaml would be a lot of work but it should be possible and give you another tool for your toolbox of problem solving with programming.

Any way, you asked for comments.

If someone is looking for a masters level thesis project, then creating a DCG package for OCaml might be worth proposing.

1 Like

Purely cosmetic notes:

  1. You can use begin end instead of brackets to enclose complex constructions like pattern matches or imperative sequences (well you can use them all the time but people mostly use them for those constructions). A matter of taste of course, but personally I find it more readable.

  2. There is a lot of rightward drift in OCaml so people often use 2-space indents and avoid indenting when not needed. For instance your way of indenting after match is unusual.

Putting these two things together here is how I would write one of your functions:

let rec get_headers headers s =
  match String.lsplit2 ~on:'\n' s with
  | Some("", tl) -> Ok (headers, tl)
  | Some(header, tl) -> 
    begin match single_header header with
    | Ok hd  -> get_headers (hd :: headers) tl
    | Error msg -> Error msg
    end
  | None ->
    begin match String.lsplit2 ~on:':' s with
    | Some hd -> Ok (hd :: headers, "")
    | _ -> Error "error"
    end

Again a matter of taste, just to give you an idea. Also the second branch is a result map and can be expressed more concisely with Result.map or with a let-operator.

  1. You don’t need the ;;, it’s only used in the interactive interpreter. The parser detects when a new top-level definition starts.
2 Likes

amazing explanation, thank you so much

1 Like

I re write it using your helps and I think it’s much better now

open Base;;

type t = {
    meth: string option;
    url: string option;
    headers: (string * string) list option;
    body: string option;
}

let (let>) res f =
    match res with
    | Ok v -> f v
    | Error err -> Error err
;;
let (let*) op f =
    match op with
    | Some v -> f v
    | None -> Error "none"
;;

let get_method req s =
    let* (meth, tl) =  String.lsplit2 ~on:' ' s in
    Ok ({ req with meth = Some meth }, tl)
;;

let get_url req s =
    let* (url, tl) =  String.lsplit2 ~on:' ' s in
    Ok ({ req with url = Some url }, tl)
;;

let single_header header =
    let* (key, value) = String.lsplit2 ~on:':' header in
    Ok (key, value)
;;

let rec get_headers headers req s =
    match String.lsplit2 ~on:'\n' s with
    | Some("", tl) -> Ok ({ req with headers = Some headers }, tl)
    | Some(header, tl) ->
        begin
            match single_header header with
            | Ok hd  -> get_headers (hd :: headers) req tl
            | Error msg -> Error msg
        end
    | None ->
        begin
            match String.lsplit2 ~on:':' s with
            | Some hd -> Ok ({ req with headers =  Some (hd :: headers) }, "")
            | _ -> Error "error"
        end

;;

let get_body req s = Ok({ req with body = Some s}, "") ;;

let from_string s =
    let req = { meth = None; url = None; headers = None; body = None } in
    let> (req, s) = get_method req s in
    let> (req, s) = get_url req s in
    let> (req, s) = get_headers [] req s in
    let> (req, _) = get_body req s in
    Ok req

I know you’re focusing on nested matches but I wanna point out that the first three functions are Option.to_result, or more generally, Option.value and Option.fold. So,

let get_method =
  String.lsplit2 ~on:' ' >> Option.to_result ~none:"method cannot be parsed"
1 Like

As a completely separate observation, you might think about doing all of this with a single regular expression (ok, 2-3 of them):

  1. first break apart the first line into method, uri, version, the headers, and the body.
  2. then break the headers apart into lines
  3. then break each header apart into key/value

You can make the #1 regexp sufficiently complete that #2, #3 will not fail.

1 Like

Cool, makes them even simpler thanks

I did this program mostly for learning but you’re right using regex will make it more robust