Parsing JSON with result type only

I’ve been playing with yojson and notice it relies pretty heavily on exceptions, which I would like to avoid.

I was under the impression that it’s pretty much the go-to solution for JSON parsing in OCaml. Is that correct?

Is there a good alternative you’ve had a good experience with?

As a reference point, I’ve played with parsing with Angstrom too, and that’s pretty much the dev experience I want: make the non error state flow through the types as much as possible.

Here’s a little (messy and not quite correct) snippet of what I’m trying to achieve:

module U = Yojson.Safe.Util

let member str json =
  try
    let x = U.member str json in
    Ok x
  with
  | _ -> Error "Oops"
;;

let conv s : (payload, string) result =
  let ( let* )       = Result.bind in
  let ( >>= )        = Result.bind in
  let ( >|= )        = Result.map |> Fun.flip in
  let  root          = Yojson.Safe.from_string s in
  let* shipment_node = member "shipment" root in
  let* tracking      = shipment_node |> member "idShip" >|= U.to_string in
  let* event_node    = shipment_node |> member "event"  >>= fun x -> Ok (U.to_list x) in
  let* items         = event_node    |> to_items |> Option.to_result ~none:"No events!" in
  let delivered =
    List.fold_left
      (fun acc x ->
        if x.code = delivered_code then
          Some { on = x.date; label = x.label }
        else
          acc)
      None items
  in
  Ok { tracking; delivered; src = items }

Wrapping the raising functions as I did here with member does not feel very practical, so I think I should probably switch library.

Could you explain what you want to do exactly? What does the json look like and what ocaml structure do you want to extract?

Have you tried using the deriver ppx_deriving_yojson? By default it uses the result type for decoding JSON values into custom types.

I’m not sure what’s wrong with wrapping the calls in something that catches the exception and transforms it into a result.

Yes @yawaramin I did try the ppx extension first.

However, I found that parsing would fail unless I represented the whole JSON object which is not very practical if I only need to extract part of a big tree.

Also, I found the error messages quite poor and this is acknowledged by the maintainers: Improve error messages · Issue #80 · ocaml-ppx/ppx_deriving_yojson · GitHub

So that’s why I then turned to a more ad hoc approach.

@hbr I realize that my initial wording was indeed confusing :slight_smile:

Here’s a complete example that should make things clearer:

module U = Yojson.Safe.Util

let json_str =
  {|
{
    "lang": "fr_FR",
    "scope": "open",
    "returnCode": 200,
    "shipment": {
        "idShip": "123",
        "product": "whatever",
        "event": [
            {
                "code": "DONE",
                "label": "Signed by: Mr X",
                "date": "2023-06-10T10:23:18+02:00"
            },
            {
                "code": "WIP2",
                "label": "Something...",
                "date": "2023-06-10T07:25:51+02:00"
            },
            {
                "code": "WIP1",
                "label": "Something...",
                "date": "2023-06-10T00:06:40+02:00"
            }

        ]
    }
}
|}
;;

type delivered = { on : string; label : string }

type response =
  { status_code : int; tracking_number : string; delivered : delivered option }

let string_of_delivered opt =
  opt
  |> Option.map (fun x -> Printf.sprintf "Delivered on: %s (%s)" x.on x.label)
  |> Option.value ~default:"NOT YET"
;;

let string_of_response x =
  Printf.sprintf
    {|
  Tracking summary for parcel n°%s:

  API server status code was: %d
  Current delivery status: %s
  |}
    x.tracking_number x.status_code
    (string_of_delivered x.delivered)
;;

let parse_json s : (response, string) result =
  let root = Yojson.Safe.from_string s in
  let status_code = root |> U.member "returnCode" |> U.to_string in
  let shipment = root |> U.member "shipment" in
  let tracking_number = shipment |> U.member "idShip" |> U.to_string in
  let events = shipment |> U.member "event" |> U.to_list in
  let delivered =
    List.fold_left
      (fun acc x ->
        let code = x |> U.member "code" |> U.to_string in
        let label = x |> U.member "label" |> U.to_string in
        let on = x |> U.member "date" |> U.to_string in
        if code = "DONE" then
          Some { on; label }
        else
          acc)
      None events
  in
  Ok { status_code = status_code |> int_of_string; tracking_number; delivered }
;;

let () =
  match parse_json json_str with
  | Error x -> print_endline @@ "Parsing error: " ^ x
  | Ok v -> print_endline @@ string_of_response v
;;

(* Fatal error: exception Yojson.Safe.Util.Type_error("Expected string, got int", _) *)

There are many things I don’t like about this program. Mainly being though, that it can fail in unpredictable ways. As a general rule, I’m trying to pay more and more attention to where I allow my programs to fail.

It currently fails with this (not very useful) error:

Fatal error: exception Yojson.Safe.Util.Type_error(“Expected string, got int”, _)

But the fact that it currently fails is not what’s actually bothering me.

Ideally, I would like to express failure in the types as such:

  • step a returns a result
  • step b returns a result
  • step c returns a result
  • at the end of the parsing, I return a result

Along each parsing step, I could “override” the error with Result.map_error for instance (if I wanted to).

I think I would enjoy and overall feel better with this approach.

It’s probably just a matter of preference, nothing wrong with it in my mind. As I said, I like to see the errors flow through the types as much as possible.

Still learning though.

Hey, have you checked out Atdgen? I’ve been using it and it does what you’re asking about being able extract only a portion of the JSON object. Basically, you define the JSON schema you’re interested in using their DSL and it generates the parsing logic to extract that portion for you.

With regards to the error handling, the generated parsers also throw exceptions for malformatted JSON and I’m not so sure their error messages are much better. That said, atdgen also lets you write data validators and specify optional types which might help stub the problem!

2 Likes

By default yes, but you can turn off that behaviour:

By default, objects are deserialized strictly; that is, all keys in the object have to correspond to fields of the record. Passing strict = false as an option to the deriver (i.e. [@@deriving yojson { strict = false }] ) changes the behavior to ignore any unknown fields.

Set strict to false: [@@deriving yojson {strict = false}]

Damn, I missed that! Adding strict = false works nicely.

On my first try, before posting, I saw that calling ty_of_yojson returned a type named Ppx_deriving_yojson_runtime.error_or.

Which my brain interpreted as “runtime”…“error” → exception!

But this is just an alias to the result type (why use such an alias?)

I also have another source of confusion, the README says:

When the deserializing function returns Error loc , loc points to the point in the JSON hierarchy where the error has occurred.

However, loc is just a string. “Loc”, to me, suggest that I could extract data such as line number, etc. Have I got that wrong? What can I do with loc appart from printing it and how could I debug a potentially unclear parsing failure?

I agree with you that the fact that Yojson returns a lot of exceptions is unfortunate. To its defence, it was written way before there was a result type in the standard library.

The new functions like path use option types, but if someone would like to contribute result/option type variants, you’re more than welcome to. However, I don’t find using the Util module all that nice since it leads to fragile code that often times mixes accessing values with computation, so it is better to use ppx_deriving (or the like) and parse, don’t validate.

There’s not really much you can do with a parsing failure, is there? It means the input is malformed, and you need to stop the process and fix the input first. Or are you saying that you want to recover and return some results from any input, no matter how malformed?

Thanks for the feedback, that makes sense. I didn’t mean to sound critical if that’s what you felt!

Well, I don’t know. I guess I’m sensitive to receiving bad input (which I can’t fix because I have no control over it), but still have to manage things somehow.

I like the idea of the result type. I want to see how far I can push things with it, I find it easier and easier to use.

If your’e curious, I’ve been thinking about exceptions a lot: what are they good for and when to use them. My short answer is (I think) that they are good to use when the thing you’re trying to express does not make sens within the type system.

Here’s an example I’ve come across lately, playing with Angstrom: digits is parsing function that returns an int, wrapped in the Angstrom type:

module A = Angstrom

let ( >>| ) = A.( >>| )

(* chomp one ore more digits *)
let digits : int A.t =
  A.take_while1 (function
    | '0' .. '9' -> true
    | _ -> false)
  >>| int_of_string

I call it like this:

let%test_unit "parsing digits" =
  let ( => ) = [%test_eq: (B.int, B.string) B.Result.t] in
  let parse_digits = A.parse_string ~consume:All digits in
  let parse_digits2 s = parse_digits s |> Result.map_error (fun _ -> "Not a digit: " ^ s) in
  ()
  ; parse_digits  "1"   => Ok 1
  ; parse_digits  "2"   => Ok 2
  ; parse_digits  "12"  => Ok 12
  ; parse_digits  "12x" => Error ": end_of_input"
  ; parse_digits2 "12x" => Error "Not a digit: 12x"

As you can see, I’m calling int_of_string, which can raise. But within this context, it makes absolutely no sense to use int_of_string_opt instead. I would gain nothing using it.

Also, I can override a subpar error message easily.

Anyhow, that’s enough of me rambling, using the ppx seems like a good option for now :slight_smile:

1 Like