I am learning Ocaml and as a simple project I have written a simple http parser.
open Base;;
type t = {
meth: string;
url: string;
headers: (string * string) list;
body: string;
};;
let get_method s =
match String.lsplit2 ~on:' ' s with
| Some (meth,tl) -> Ok (meth, tl)
| _ -> Error "method cannot be parsed"
;;
let get_url s =
match String.lsplit2 ~on:'\n' s with
| Some (meth,tl) -> Ok (meth, tl)
| _ -> Error "url cannot be parsed"
;;
let single_header header =
match (String.lsplit2 ~on:':' header) with
| Some(key, value) -> Ok (key, value)
| _ -> Error "single header cannot be parsed"
;;
let rec get_headers headers s =
match String.lsplit2 ~on:'\n' s with
| Some("", tl) -> Ok (headers, tl)
| Some(header, tl) -> (match single_header header with
| Ok hd -> get_headers (hd :: headers) tl
| Error msg -> Error msg)
| None -> (match String.lsplit2 ~on:':' s with
| Some hd -> Ok (hd :: headers, "")
| _ -> Error "error")
;;
let get_body s = Ok(s, "");;
let from_string s =
match get_method s with
| Ok (meth, s) -> (match get_url s with
| Ok (url, s) -> (match get_headers [] s with
| Ok (headers, s) -> (match get_body s with
| Ok (body, _) -> Ok { meth = meth; url = url; headers = headers; body = body }
| Error msg -> Error msg)
| Error msg -> Error msg)
| Error msg -> Error msg)
| Error msg -> Error msg
in the last function , i wanted to know if there is any better way to handle nested match expressions? Also I appreciate any comment or review on this code. Thank you in advance.
I think there’s some pattern of using monads, to get what you want. It’s complicated a bit b/c you’re using both the Option and the Result monad, but if you were to wrapper the split2 so that it returns a Result, I suspect you can probably clean things up.
Would you please give me hint on the pattern name or provide an example ? imagine my split function returns results, i can easily wrap it if it helps make the code simpler
For example, you can wrap up the result processing as a let operator
let (let*) = Result.bind
and write
let from_string s =
let* meth , s = get_method s in
let* url , s = get_url s in
let* headers, s = get_headers [] s in
let* body , _ = get_body s in
Ok { meth = meth; url = url; headers = headers; body = body }
or wrap up both the result processing and state passing as a let operator
let (let** ) m k s = match m s with Error s -> Error s | Ok (v, s') -> k v s'
let return v = fun s -> Ok (v,s)
and write
let from_string =
let** meth = get_method in
let** url = get_url in
let** headers = get_headers [] in
let** body = get_body in
return { meth = meth; url = url; headers = headers; body = body }
First off I know this is an OCaml site and I do like OCaml but will not be suggesting OCaml as you know it with this comment.
If you spend time parsing real world HTTP and HTML you will soon find that not all tags are open or closed per the standard. For example </b> is often used without a start tag <b>. There are ways to get a parser to parse such but I find that using a non-deterministic parser works better. So don’t think of this as a problem and OCaml as the solution think of this as a problem and what is an effective way to solve the problem.
For non-deterministic coding I prefer Prolog and for parsing with Prolog I prefer Definite Clause Grammars (DCG). I didn’t find an OCaml package for DCG but DCGs are syntactic sugar for difference list for which there are OCaml packages (ref).
To parse HTTP and HTML with difference list using OCaml would be a lot of work but it should be possible and give you another tool for your toolbox of problem solving with programming.
Any way, you asked for comments.
If someone is looking for a masters level thesis project, then creating a DCG package for OCaml might be worth proposing.
You can use begin end instead of brackets to enclose complex constructions like pattern matches or imperative sequences (well you can use them all the time but people mostly use them for those constructions). A matter of taste of course, but personally I find it more readable.
There is a lot of rightward drift in OCaml so people often use 2-space indents and avoid indenting when not needed. For instance your way of indenting after match is unusual.
Putting these two things together here is how I would write one of your functions:
let rec get_headers headers s =
match String.lsplit2 ~on:'\n' s with
| Some("", tl) -> Ok (headers, tl)
| Some(header, tl) ->
begin match single_header header with
| Ok hd -> get_headers (hd :: headers) tl
| Error msg -> Error msg
end
| None ->
begin match String.lsplit2 ~on:':' s with
| Some hd -> Ok (hd :: headers, "")
| _ -> Error "error"
end
Again a matter of taste, just to give you an idea. Also the second branch is a result map and can be expressed more concisely with Result.map or with a let-operator.
You don’t need the ;;, it’s only used in the interactive interpreter. The parser detects when a new top-level definition starts.
I re write it using your helps and I think it’s much better now
open Base;;
type t = {
meth: string option;
url: string option;
headers: (string * string) list option;
body: string option;
}
let (let>) res f =
match res with
| Ok v -> f v
| Error err -> Error err
;;
let (let*) op f =
match op with
| Some v -> f v
| None -> Error "none"
;;
let get_method req s =
let* (meth, tl) = String.lsplit2 ~on:' ' s in
Ok ({ req with meth = Some meth }, tl)
;;
let get_url req s =
let* (url, tl) = String.lsplit2 ~on:' ' s in
Ok ({ req with url = Some url }, tl)
;;
let single_header header =
let* (key, value) = String.lsplit2 ~on:':' header in
Ok (key, value)
;;
let rec get_headers headers req s =
match String.lsplit2 ~on:'\n' s with
| Some("", tl) -> Ok ({ req with headers = Some headers }, tl)
| Some(header, tl) ->
begin
match single_header header with
| Ok hd -> get_headers (hd :: headers) req tl
| Error msg -> Error msg
end
| None ->
begin
match String.lsplit2 ~on:':' s with
| Some hd -> Ok ({ req with headers = Some (hd :: headers) }, "")
| _ -> Error "error"
end
;;
let get_body req s = Ok({ req with body = Some s}, "") ;;
let from_string s =
let req = { meth = None; url = None; headers = None; body = None } in
let> (req, s) = get_method req s in
let> (req, s) = get_url req s in
let> (req, s) = get_headers [] req s in
let> (req, _) = get_body req s in
Ok req
I know you’re focusing on nested matches but I wanna point out that the first three functions are Option.to_result, or more generally, Option.value and Option.fold. So,
let get_method =
String.lsplit2 ~on:' ' >> Option.to_result ~none:"method cannot be parsed"