Advice for combining multiple monads

In a recent project, I was needing to “mix” a couple of different monads: concurrency monad (Async.Deferred), error monad (Core.Or_error.t), and option monad.

I’m using ppx_let for monadic let bindings, so it isn’t too bad.

However, I haven’t really written too much code using async (or lwt), so I’m not sure whether I could be doing things in a nicer way.

(Note: this post is similar to this one: How to combine 3 monads: Async/Lwt, Error and State? .)

tl; dr

I’m looking for tips/advice/strategies/articles/info on “mixing” monads.

Example

Here is a toy example to show a pattern that sort of popped up again and again in my code. I annotated types and added comments to help make clarify what’s going on.

(* Move from the option monad to the Or_error monad. *)
let or_error_of_option : 'a option -> 'a Or_error.t = function
  | None -> Or_error.error_string "None"
  | Some v -> Or_error.return v

(** [or'] returns [v] if [t] is [Ok v], otherwise, when [t] is [Error e],
    returns the result of [default e].  

    I.e., for logging errors, then providing default values.

    {[
      let a = Or_error.error_string "Something bad!"

      let b =
        or_default a ~default:(fun e ->
            (* Log the error *)
            prerr_endline @@ Error.to_string_hum e;
            (* Provide default *)
            "a default string")
    ]} *)
let or_default : 'a Or_error.t -> default:(Error.t -> 'a) -> 'a =
 fun t ~default -> match t with Ok v -> v | Error e -> default e

(* Some silly functions to stand in for actual that return different monadic values.  

    Note how some functions return options, some Or_errors, some Deferreds, etc. *)
let f0 () : int Deferred.t = Deferred.return 1
let f1 () : int Deferred.Or_error.t = Deferred.Or_error.return 1
let f2 () : int option = Some 1
let f3 () : int Or_error.t = Or_error.return 1
let f4 () : int Deferred.t = Deferred.return 1
let f5 a b c d e : int Deferred.Or_error.t = Deferred.Or_error.return (a + b + c + d + e)

(* An example of the sort of "pipeline" style function. *)
let f () : int Deferred.Or_error.t =
  let%bind.Deferred (a : int) = f0 () in
  let%bind.Deferred.Or_error (b : int) = f1 () in
  let%bind.Deferred.Or_error (c : int) =
    Deferred.return @@ or_error_of_option @@ f2 ()
  in
  let%bind.Deferred (d : int) =
    Deferred.return
    @@ or_default ~default:(fun err ->
           (* Stand in for logging *)
           prerr_endline ("there was an error: " ^ Error.to_string_hum err);
           (* After logging the error, provide a default and keep going. *)
           100)
    @@ f3 ()
  in
  let%bind.Deferred (e : int) = f4 () in
  f5 a b c d e

Above, f is an example of the sort of “pipeline” style function. In my project, it might be, make an http request, convert to lambdasoup, use a selector to select a tags, get another link, make another request, that sort of thing. (In reality, this function probably would be broken up into smaller ones…) The point is, at different “steps” of the pipeline, different monads are used.

Improving it

Now, it seems okay the way it is above. Since, Deferred.Or_error exists, using ppx_let along with whatever return function is appropriate makes things not too bad. Especially if you have a little helper to convert options to or_errors, or utility functions like or_default to help with logging errors and that sort of thing.

I was wondering though if it could be better. Here are a couple of things I thought of doing.

Lift all helper functions into the Deferred.Or_error monad

For example, if I know the little functions (in this example, f0 through f5, are only really ever used in the above context, I could change those to accept the right monad and return the right monad, which would avoid some of the let%bind stuff. (Maybe something similar to what is shown in the railway oriented programming article from F# for Fun and Profit.)

In other word, just lift the small helper functions into the Deferred.Or_error monad and be done with it.

But I wasn’t sure if that was the right approach either as sometimes the functions are just pretty simple and the monad stuff (error handling, async, whatever) is sort of incidental to what the function itself is handling. Like, do I really want to clutter up an essentially simple function with an Async or Error monad? I’m not sure.

Use a monad transformer library

Could be something like the example in the discuss post here showing how to use a monad transformer library may also be a good option. On the other hand, that may be overkill.

Just stick with monadic-let bindings

Or maybe the monadic-let bindings are the simplest or most convenient way to deal with the problem.

Wrap up

Anyway, I’m not sure the best approach to take (or if there even is a “best” approach), so any advice would be much appreciated!!

4 Likes

Let us first discuss why we are using monads and how they let us develop better programs. In my opinion, the core idea of a monad is that it is an abstraction that enables us to write generic code. Generic code, in turn, reduces coupling and improves code reuse, improving the code quality, by making it easier to read1, maintain, and change. How a monad structure is implemented is an implementation detail, which should be hidden to reduce the monad user’s cognitive load and prevent the abstraction’s leaking.

Your simple example demonstrates that having several monads in the scope of the same expression greatly increases the cognitive load as you have to operate with all of them at once. Instead, I suggest defining a single monad, which would be domain-specific, i.e., a construction that is relevant to your application. You can add extra layers of different monads to it, but hide the actual implementation under the interface. You can extend the interface with helper functions and let-binding operators for common cases, like let? that will operate on optional types, or let?! which will handle or_errors, and so on. But you shall never mix monads in the user code. The user code shall see only one monad.

To summarize, shake well your monads before usage and use a straw.


1) As a powerful abstraction, monads reduce cognitive load. However, the germane load is very high, – monadic code is generally perceived as incomprehensible by programmers that are not well-versed in monads, which is, probably, the majority of programmers.

7 Likes

Personally I would stick with approximately your current approach. In some situations like this, you can move the async binds to the top of your code, which nicely separates calls to external services and logic that can be unit tested. If you can’t do that because some async calls depend on the results of others, sometimes it can be nicer to bind on Deferred.Or_error everywhere (although in that case I would let open Deferred.Or_error.Let_syntax in and just do let%bind) but sometimes it’s nicer (IMO) to have a mix as in your example code.

A couple of small things I think would improve your code:

  • Instead of let%bind foo = monad @@ boilerplate @@ actual_logic bar, do let%bind foo = actual_logic bar |> boilerplate |> monad so it’s easier to skip over the uninteresting part
  • Use the fact that Deferred is the default monad in scope, so you can just do let%bind rather than let%bind.Deferred and return instead of Deferred.return. It’s not relevant to your exact example, but Deferred.ok is often useful in this kind of code too.

Also FYI or_error_of_option more or less exists as Result.of_option.

1 Like

In the Octez project we have face a similar issue: we use both Lwt and Result pervasively.

TL;DR:

  • We use dedicated syntax module for each monad (Lwt, result, Lwt+result) which export binding operators (let*, let+, return, etc.).
  • We have a Stdlib supplement exposing monadic variants of all the Stdlib’s traversal functions (like Lwt_list but much more extensive)

Some time ago

Until not so long ago we used infix operators for bind. And we would mix different operators depending on what sub-monad a specific expression would be in. So we would have >>=? for Lwt+result, >>? for result-only, and >>= for Lwt-only. Plus we had a dedicated operator for when you use a result-only expression in an Lwt+result context: >>?=. We don’t need the other specialised binder because Lwt-only and Lwt+result mix quite well: (_, _) result Lwt.t is just a specific case of _ Lwt.t so >>= just works.

We also had a very flat namespace where all the operators as well as some helper functions (e.g., we had error : 'err -> ('a, 'err) result and fail : 'err -> ('a, 'err) result Lwt.t) were exported by an Error_monad module which was opened everywhere (using -open as a build flag).

Now

We have changed a few things.

  • We use binding operators (let* and such) which are exported by dedicated syntax modules.
  • Our Lwt+result monad syntax includes dedicated binding operators for Lwt-only and result-only expressions
    • let*! is for Lwt-only (mnemonic: you must(!) wait for the promise to resolve)
    • let*? is for result-only (mnemonic: there may(?) be a value there or maybe an error)
    • Same for the Lwt_option_monad syntax module
  • Internally, we recommend to only open those locally. So you start your function by let open Lwt_result_syntax (or whichever monad you are actually using there).
  • We have an extensive Stdlib supplement with many of the monadic variants of the provided traversors baked in. E.g., List.map_s is the Lwt-only equivalent of List.map, List.map_e is the result-only, and List.map_es is the Lwt+result.

One thing I didn’t mention is that we actually have a specialised result called tzresult: 'a tzresult = ('a, tzrerror trace) result where tzerror is a custom error type and trace is a data-strucutre holding several errors. Traces (of errors) allows us to combine errors in different ways. The first way is you can add an error to a trace to add higher-level context about the lower-level error. The second way is for errors that happen concurrently (e.g., you evaluate concurrently several Lwt+result expressions and more than one fails). This ability to combine concurrent errors gives us a semantic for and* in the Lwt+result syntax module.

There are several downsides to our current approach, but all in all it works well enough.

  • Even for functions of modest size, it’s not always immediately visible which monad you are located in. This is even worse when you are viewing just a chunk of a diff.
  • Some parts of the Stdlib don’t lend themselves to our monadification (e.g., Seq) or require a bit of boilerplate because they don’t expose internal representations (e.g., Map’s monadic traversors are largely implemented in terms of Seq).
  • The Stdlib supplement is a lot of code with a lot of tests and a lot of comments. It’s just a large volume of low-complexity code to deal with.

There are also some very pleasant upsides:

  • The separate monad syntax modules encourages you to write each function with the smallest monad that it needs. This in turns
    • encourages you to split your function into smaller components which are more easily testable,
    • makes the type of functions informative: an Lwt.t function is very likely to actually be doing I/O at some point down the line, and a result function is very likely to actually return Error in some situations.
  • The Stdlib supplement makes it quite easy to adapt code for one monad to another.

For more complete information, you can check the Error-monad tutorial.

Later

There are a few things we want to change. Albeit we have no urgent need to do so.

  • Remove some legacy helper functions that are still hanging around.
  • Organise the namespace better.
  • Generate a lot of the code and doc of Lwtreslib automatically.
  • Improve traces (currently, for mostly historical reasons, we use list).
6 Likes

Brr uses the same approach for its futures which are used to FFI with JavaScript promises (these are not called promises because they don’t directly map on JavaScript promises as those have a non-monadic semantics which would break OCaml type safety).

Basically you have futures, future results and two syntaxes to choose from depending on your context.

2 Likes

I quite like the trace idea - I wish I’d thought of this for our codebase, but it would require too many changes in the calling code to adopt now. We just use string for the error type in our codebase and I’m not entirely satisfied with the “just pick the first error / fail fast” semantics I proposed for and* in Lwt_result.Syntax. Another choice might even have been to parametrize the Syntax module over desired behavior of how to combine concurrent errors, but we decided against it to avoid having to invoke a functor repeatedly all over the code (we’re already quite verbose - we don’t do global opens, and try to be explicit, e.g: let open Lwt_result.Syntax in - to know exactly which monad we’re talking about).

Looking at brr it makes a reasonable choice too, but it looks like it lacks commutativity @aantron described in this comment for Lwt_result.both.

Also, how does Haskell solve this problem of combining concurrent errors? Or does Haskell not have this problem because it’s more (type-)classy?

I’m not sure I understand the commutativity there. What is the equation you are seeking for ?

result_pair (Error e0) (Error e1) = result_pair (Error e1) (Error e0)

That seems hard to achieve without an error merge function and a notion of order on the errors which gets a bit unwieldy in my opinion.

But yes as mentioned here the ands are left-leaning. Not doing so would require either to get into another monad (undesirable) or provide an error merge function. The latter looks rather unconvenient to me, but since I don’t find myself using and, I could be convinced that I’m wrong with a few good examples. However that would still not give you commutativity (unless your error merge function is, which is unlikely).

Implementation details on our side:

I’d say that string is quite manageable for error management. If you want to structure it a bit more you can mash strings together a bit. Something like

val trace : string -> ('a, string) result -> ('a, string) result
let trace high_msg = function
  | Ok v -> Ok v
  | Error low_msg ->
    (* indent all of the local errors *)
    let low_msg = "\t" ^ String.concat "\n\t" (String.split '\n' low_msg) in
    (* layout high error and low error *)
    Error (Printf.sprintf "%s\n%s" high_msg low_msg)

val parmash : ('a, string) result list -> ('a list, string) result
let parmash rs =
  let rec loop acc = function
    | (Ok v) :: xs -> loop (v :: acc) xs (* accumulate Oks *)
    | (Error msg) :: xs -> loop_err [msg] xs (* Switch to err loop on first Error *)
    | [] -> Ok (List.rev acc) (* just return accumulated Oks *)
  and loop_err acc = function
    | Ok _ :: xs -> loop_err acc xs (* Discard Oks *)
    | Error msg :: xs -> loop_err (msg :: acc) xs (* Accumulate Errors *)
    | [] -> match acc with
        | [] -> assert false
        | [msg] -> Error msg (* single error: return as-is *)
        | _ :: _ :: _ as msgs ->
            (*multiple errors: pretty-print with separators *)
            Error ("|" ^ String.concat "\n|" (List.rev msgs))
    )
  in
  loop [] rs

Or something along those lines. And then and* calls parmash on the resolved promises to make line-separated |-prefixed error messages.

Basically you end-up just doing the printing of the trace in advance. And it’s heavier than to carry a type trace = Nest of string * string trace | Par of string trace list | Leaf of string and do the printing only once. Plus you can’t have fine control over the printing like hbox and vbox. But it might be acceptable if the errors are truly the exceptional case and you don’t have too many levels of them.

Yes, that’s what aantron was after - maybe this comment is more useful to understand whether this matters for brr.

I’m in no way convinced you’re wrong - I just don’t know whether you’re right. I use and* often in my code - other than Lwt_list.map_p style seeing an and* in code is the quickest way I can tell something is happening concurrently:

let* a = foo ()
and* b = bar ()
... in
let* z = ...

In the exceptional case that both computations are errors - it is a little weird that the code behaves differently if foo errors before bar or the other way around. Wouldn’t it be even more weird, if it also mattered whether I wrote foo first or bar first? Seems to go against the notion of “concurrent”.

Yeah that would work - we’d have a module Lwt_result_str with and* specialized to call parmash or equivalent to track the trace. It just “feels” like there ought to be a more general thing somewhere for this, not specialized to strings.

Thanks! I will take some time to go through these.

That didn’t help :–) But lwt’s algebra always confused me.

In any case brr’s and* here is built on top of pair which waits for both resolutions before proceeding.

That’s precisely not the case since the left error (first syntactically) is always taken when both error occur. But then yes that’s not very “concurrent”, maybe I should add a merge function there but I’m not sure how it works out with the sugar of binding operators.

Thanks everyone for all the awesome replies, I appreciate it!

Here’s a summary for easy reference:

  • Use a single, domain-specific monad. Users/API consumers will see this monad, and not worry about implementation (link)
  • Use dedicated syntax modules for each monad (e.g., Lwt, result, Lwt+result) which export binding operators link.
  • Continue with the ppx_let chaining, but take advantage of local opens getting the right stuff in scope (link)
  • If you’re feeling adventurous, check out effects (e.g., rea) (link).
7 Likes

Be careful, soon you will be writing Haskell code. :wink:

1 Like