Let* syntax, semicolon and chaining units

In OCaml, we can replace this code:

let do_stuff1 () =
  let () = print_endline "Step 1" in
  let () = print_endline "Step 2" in
  let () = print_endline "Step 3" in
  print_endline "done"

With this one:

let do_stuff2 () =
  ()
  ; print_endline "Step 1"
  ; print_endline "Step 2"
  ; print_endline "Step 3"
  ; print_endline "done"

(forgive my unconventional style, I add a bogus unit at the start because I like the symmetrical look)

If allowed, I could have defined the ; operator myself: I choose to use & instead, for illustration purposes.

let do_stuff3 () =
  let ( & ) a b =
    let () = a in
    b
  in
  ()
  & print_endline "Step 1"
  & print_endline "Step 2"
  & print_endline "Step 3"
  & print_endline "done"

Playing with the let* syntax, I notice I sometimes face the same pattern.

Whereas this would be the “traditional” style to work with monads and there isn’t much to say:

let compute =
    let ( >>= ) = Result.bind in
    res1 >>= fun () ->
    res2 >>= fun () ->
    res3 >>= fun () -> res4

We can also use this alternative syntax:

let compute =
    let ( let* ) = Result.bind in
    let* () = res1 in
    let* () = res2 in
    let* () = res3 in
    res4

But this looks very similar to the reason why the semicolon operator was invented. Since I can’t override semicolon, I instead define the & operator again, but this time it works in a monadic context.

let compute =
    let ( let* ) = Result.bind in
    let ( & ) a b =
      let* () = a in
      b
    in
    Ok ()
    & res1
    & res2
    & res3
    & res4

Again, I add a bogus Ok () for symmetry, but we could of course write this code as:

res1 &
res2 &
res3 &
res4

Has this been given any thought? I feel that there is something missing to the let* syntax. We “should” have a semicolon-equivalent operator that could work in a monadic context.

If you write your code according to established typographic rules (that is punctuation does not start newlines), an explicit unit at the end of sequences is quite useful and no so unconventional. It makes it easier to add a new statement to a sequence or reorder it, e.g. in test suites.

Other than that I’m not really sure I understood your question. Note that monads are often refered to as a “programmable semicolon” and that’s what the let* syntax gives you.

3 Likes

It’s not a new idea, just never got much traction

I see now. It’s easier to understand with types. So you want a special operators for binding unit t monads. Well some people like to add all sorts of operators to their code.

Personally, except for the established math ones, I think most operators tend to obscure rather than enlight and it’s better to use them with economy. It all looks great (or not so depending on your taste) while you have them in your head but when you have to come back to the code later to just make a small change/hunt for a bug then you need to remind yourself of all the semantics which a name (or an unpacking let* () = in this case which I personally find easier to follow) would convey directly.

5 Likes

Great find @monoidoid thanks. It looks like I’m referring to what alainfrisch refered to as @@@ then later as ;+

@dbuenzli yes I prefer not to use an operator if it is not conventional (most of the times). I just found that using the special let* syntax naturally leads you to wanting to define an operator similar to ; and wondered if this had been discussed.

From the perspective of wanting absolute clarity, I think this is better:

let do_stuff1 () =
  let () = print_endline "Step 1" in
  let () = print_endline "Step 2" in
  let () = print_endline "Step 3" in
  print_endline "done"

But if I want to reduce visual noise somewhat, this is very much fine and not surprising.

let do_stuff2 () =
  ()
  ; print_endline "Step 1"
  ; print_endline "Step 2"
  ; print_endline "Step 3"
  ; print_endline "done"

I wanted to know if I could apply the same principle, with a conventional operator, when building up a computation with result where each step returns unit on success.

You need to be careful with evaluation order of function arguments when you play these games. The evaluation order is not identical in native and byte code and in general it’s not a good idea to rely on evaluation order of arguments.

4 Likes

Interesting, thanks for letting me know.

:thinking:

Is this the reason why we can’t inspect the definition of semicolon? Meaning it behaves like a function but we can’t inspect its implementation nor override it.

I deduce that it’s handled as a special case by the compiler?

I’m fairly sure it’s meant to be. Do you have an example?

It is not a function at all, and does not behave like one either. expr1; expr2 means “evaluate expr1, drop its result, then evaluate expr2”. On the other hand, expr1 & expr2 means "evaluate expr1 and expr2 in some undefined order, then call ( & ) on the results`.

It’s actually the same. It’s possible that there remain a few corner cases where byte code and native evaluation order differ, but in general it is considered a bug. However, it is not specified, and in practice I would assume a & b to evaluate b before a, whatever the definition of ( & ).

I had an imperative mindset when I wrote that. Plus, I always assumed evaluation was applied left to right but I can see I was wrong there. Thanks for clarifying!

I wrote this based on memory and don’t have an example - I remembered (maybe wrong) that one would use left to right and the other right to left. In any case, it’s right to left for byte code, which still could be unexpected:

utop # let f x y = ();;
utop # f (print_endline "a") (print_endline "b");;
b
a

This is something that PPXes provide, and actually this is one of the reasons why I stick with PPXes providing monadic syntaxes rather than using let* and its friends. It is valid syntax to write

let compute =
    res1;%x
    res2;%x
    res3;%x
    res4

or, in your style:

let compute =
    ()
    ;%x res1
    ;%x res2
    ;%x res3
    ;%x res4

If x is a syntax extension that defines the sequence as what you want, you’re done. You also can get all the other operators (match%x, try%x, assert%x and whatnot). I very much like this style and use it regularly for my own code.

I had this exact ‘issue’ not too long ago. From my quick research in the Haskell world, you have the >> operator when you want to continue your monadic computations while discarding the previous result.

I had first implemented it this way : let (>>) x y = x >>= fun _ -> y but expanding on what @lindig said, while in Haskell, function arguments are evaluated lazily, this is not the case in OCaml.

I found out the hard way : I was doing computations with an error monad on a graph structure and had something like :

abort_if (check_something x) "problem with x" >>
abort_if (check_another_thing y) "problem with y" >>
abort_if (has_cycle z) "cycle in z" >>
do_something x y z

where do_something loops indefinitely if z has a cycle in it and abort_if does not execute the next computations if the condition is true.

However, do_something was always executed because the arguments of >> are both evaluated first before entering the function body.

What I ended up doing was using Lazy.t : let (>>) x (lazy y) = x >>= fun _ -> y but now the syntax is even worse than with let* _ = x in because of the nesting :

abort_if (check_something x) "problem with x" >> lazy (
  abort_if (check_another_thing y) "problem with y" >> lazy (
    abort_if (has_cycle z) "cycle in z" >> lazy (
      do_something x y z
    )
  )
)

I’d be interested to know if there is a better solution that does not involve PPXes.

I’ve done something similar in the past by requiring y parameter in the >> operator to be a thunk, so let (>>) x y = x >>= fun _ -> y (). This makes the resulting code pretty easy on the eyes, and eliminates all problems with unintuitive evaluation order.

See: Evaluation order - Learning - OCaml

Doesn’t that forces you to nest as well ?
Edit : nevermind :

abort_if (check_something x) "problem with x" >> fun () ->
abort_if (check_another_thing y) "problem with y" >> fun () ->
abort_if (has_cycle z) "cycle in z" >> fun () ->
do_something x y z

But then I don’t see the advantage compared to the let* () syntax.

I’ve had the benefit of being able to craft the functions I’ve used this way to suit. So my e.g. abort_if function (and any others that would make sense in this kind of context) would require a trailing unit arg.

1 Like

Ah, I see ! I’m going to try it. Thank you for sharing.

Edit : With @cemerick’s suggestion, I understand this would look something like this :

() 
>> abort_if (check_something x) "problem with x"
>> abort_if (check_another_thing y) "problem with y" 
>> abort_if (has_cycle z) "cycle in z" 
>> fun () -> do_something x y z

you’d have to explicit the thunk after your last one.

Evaluation order differs between native code and byte code in some cases. For example, the following program prints “ocamlopt” when compiled with ocamlopt and “ocamlc” when compiled with ocamlc:

let r = ref "ocamlc" in print_endline (snd ((r := "ocamlopt"), !r))
3 Likes

Interesting feedback, thanks. The ppx syntax feels less flexible to me on first try.

How would you handle this use case? I want to insert a bunch of data into a database. At each step, I need to cross the Lwt context, then the Result context to ensure I got the success variant value (), before trying to insert more rows.

Here’s what I got:

let add_author conn first_name last_name : (unit, 'error) result Lwt.t =
  Author.insert conn { first_name; middle_name = None; last_name }
;;

let seed2 conn : (unit, 'error) result Lwt.t =
  let ( let* ) = Lwt_result.bind in
  let* () = add_author conn "John" "Doe" in
  let* () = add_author conn "Jane" "Doe" in
  let* () = add_author conn "Robert" "Doe" in
  Lwt.return_ok ()
;;

let seed3 conn : (unit, 'error) result Lwt.t =
  let%lwt res = add_author conn "John" "Doe" in
  match res with
  | Error e -> Lwt.return_error e
  | Ok () -> Lwt.return_ok ()
;;

let seed4 conn : (unit, 'error) result Lwt.t =
  match%lwt add_author conn "John" "Doe" with
  | Error e -> Lwt.return_error e
  | Ok () -> Lwt.return_ok ()
;;

let bind fn then_ : (unit, 'error) result Lwt.t =
  match%lwt fn with
  | Error e -> Lwt.return_error e
  | Ok () -> then_
;;

let seed5 conn : (unit, 'error) result Lwt.t =
  let ( >>= ) = bind in

  Lwt.return_ok ()
  >>= add_author conn "John" "Doe"
  >>= add_author conn "Jane" "Doe"
  >>= add_author conn "Robert" "Doe"
;;

The function seed2 using the let* syntax is small and terse.

seed3 and seed4 were experiments with the ppx syntax: we can see that’s not gonna work.

The best I could do with the ppx extension was the implementation at seed5, I feel I kinda reinvented the wheel though. And using semicolons wouldn’t make sense here.


[EDIT]

I should have written:

let bind fn then_ : (unit, 'error) result Lwt.t =
  match%lwt fn with
  | Error e -> Lwt.return_error e
  | Ok () -> then_ ()
;;

let seed5 conn : (unit, 'error) result Lwt.t =
  let ( >>= ) = bind in

  add_author conn "John" "Doe" >>= fun () ->
  add_author conn "Jane" "Doe" >>= fun () ->
  add_author conn "Robert" "Doe"
[@@ocamlformat "disable"]