Let _ = _ in _ , ';', match, and begin ... end

There are a number of syntax binding rules that I don’t feel like I fully understand:

  1. nested let _ = _ in _ … clauses

  2. nested match clauses

  3. where I can / cannot sprinkle ‘;’

  4. where I have to use begin ... end clauses

Is there a short # of rules I can memorize to understand these interactions ?

Right now, my rules are:

  1. wait for compiler error

  2. sprinkle let _ = …

  3. sprinkle begin … end

  1. let...in... expressions have this syntax:
let PAT = EXP1 in EXP2

This whole syntactic form is considered an expression. So either of EXP1 or EXP2 can contain more let...in... expressions. So you can do

let
  x = 1
in
  let
    y = 2
  in
    x + y

I’m using indentation to show the structure, but since OCaml syntax is not whitespace-sensitive, we usually just write

let x = 1 in
let y = 2 in
x + y
  1. Nested match expressions can suffer from the ‘dangling else’ problem, e.g.
match x with
| 1 ->
  match y with
  | 2 -> 0
  | _ -> 1
| _ -> 2

This complains about an unused case because the parser thinks it’s:

match x with
| 1 ->
  match y with
  | 2 -> 0
  | _ -> 1
  | _ -> 2

To avoid this we can parenthesize the nested expression:

match x with
| 1 ->
  (match y with
  | 2 -> 0
  | _ -> 1)
| _ -> 2
  1. ; can be used to separate any two expressions EXP1 and EXP2 where EXP1 : unit. E.g.
let () =
  print_int 1;
  print_int 2

(* Equivalent to *)

let () =
  let () = print_int 1 in
  print_int 2
  1. begin...end has exactly the same meaning as (...), you can use them interchangeably. E.g. from the previous nested match example,
match x with
| 1 ->
  begin match y with
  | 2 -> 0
  | _ -> 1
  end
| _ -> 2

Personally I prefer this to block off multiline expressions.

7 Likes

Interesting, is the following correct:

  1. the only real compiler confusion is nested match

  2. nested match wants to bind clauses to OUTER most match

  3. we get around this issue by either () or begin end

  4. we don’t actually need begin end? (i.e. we can always use () instead) ?

To be precise, it’s a parsing confusion, but yeah that’s pretty much it.

Btw are you using ocamlformat? With the default profile it makes all (?) those cases explicit by adding parenthesis. So it’s easier to get what is “wrong”

1 Like

No, the example above is incorrect, the compiler reads it as

match x with
| 1 ->
  match y with
  | 2 -> 0
  | _ -> 1
  | _ -> 2
3 Likes

This is life changing. I have no idea how I spent all this time without knowing about ocamlformat on save. For some reason I just assumed that the vim “autoindent” was the best there was.

If, in the future, you suspect I’m doing something stupid, please call me out on it. Happy to learn more.

Ah, so it defaults to innermost ? This does line up with past experience. So we need the begin/end () only to force it to NOT bind inner most.

1 Like

I corrected @yawaramin’s answer because I think it’s otherwise a very good answer.

2 Likes

I would say the only confusion in functional code is nested match. But once you start using ; there’s lots of potential errors (as in: shift-reduce conflicts where it’s not self-evident which way it goes). For instance, you need to wrap the pattern match in begin ... end or let () = ... in to make the following correct:

match x with
| Some x -> print_int x
| None -> print_string "No int!";
print_newline () (* this line is inside the None branch! *)

Similar issue with if clauses (solution: wrap the scope with begin ... end):

if b then
  print_string "(yes)";
  print_newline () (* this line executed unconditionally *)

Which is why the following does not compile:

if b then
  print_string "(yes)";
  print_newline ()
else ()

Then the weird thing is that adding a binding changes the behavior, so that this is wrong:

if b then
  let s = "(yes)" in
  print_string s;
print_newline () (* this line is inside the if *)

The solution is to wrap the whole if inside begin ... end, or let () = ... in, which means that to guard against both previous risks you need begin ... end both around the if expression and inside each branch.

Then of course we have the dangling else problem (tbh I’m not sure that one is a “confusion”, I find the behavior rather natural):

if b1 then
   if b2 then
    print_endline "b1 and b2"
else
  print_endline "neither b1 nor b2" (* this line acutally executes if b1 and not b2 *)
1 Like

@threepwood : Thank you, I felt there was some ambiguity with ;, but could not precisely state it. So in summary are the rules:

  1. if preceded by let .. in or match ... with, ; binds tightly

  2. if preceded by if .. then, ; binds loosely

Does that summarize the binding rules of ; ?

The conclusion is correct, but I find the reasoning is misleading. Better way of looking is, let ... in and match ... with bind as far to the right as possible, while if ... then binds tighter than ; but looser than any other operator: OCaml library : Ocaml_operators.

Also, a trailing ; is optional but allowed when it doesn’t upset the logic of what you want to express. E.g. (print_char 'a'; print_char 'b';) is fine.

Ah, I see. let/in and match/with are “greedy” when consuming ;, whereas if/then is not. Yeah, that is probably a “more useful” “mental parsing” rule.

I’m not sure if this always been in the manual but in the current version the table of precedences actually has all this information. Binding constructs (let, match, fun etc.) are the loosest binders (which is equivalent to saying they eat up everything if you think in terms of a recursive descent parser), then semicolons, then if then else, then everything else.

1 Like

I’m curious, has it ever been any discussion of allowing the use of ; (semicolon) instead of the in keyword in let ... in expressions? I’m sure people would have very strong feelings about this, but it seems possible in theory, right?

It is not possible in theory; that would make the grammar ambiguous.

That was done in ReasonML, which introduced a C-like syntax for OCaml, complete with statement-terminator semicolons. It was pretty controversial when it originally came out, and honestly I think C-like syntax for ML semantics doesn’t really make sense. let...in... is a syntactic form that packages up a binding in an expression with a clear structure. In C-like syntax that syntactic structure is gone but we are expected to keep the semantics in mind.

Thanks. Yes, I’m aware of ReasonML, but as you said they went beyond just allowing semicolons in place of in.

Yes, in ReasonML, let a=42;65; is translated let a=42 let _=65. And the Ocaml let a=42;64 is translated into let a={42;65;};.

Then the two language have a very different syntax.