What is the difference between statements and expressions in OCAML?

I’ve recently posted a very long post here where I discovered I have a fundamental confusion about how OCAML works. The confusion seems to stem that I am only used to Python (imperative languages) so OCAML seems to be a completely different paradigm and Im trying to understand it.

The way I used to think of programming was I have a line and its a statement. It usually returns a value or has side effects (e.g. prints). Thats it. Then it goes forward. Nothing else really. That seems to be a fine rough model of Python. I write code and it executes line by line until its done if it halts. But in OCAML I’ve been emphasized that difference between expressions and statements. I’ve tried googling it but I still don’t know what the difference is or what Im missing. Perhaps answering some of these would be useful

  • what is the rigorous precise difference btw statements and expressions in OCAML?
  • What part of my thinking about programming do I need to think to understand really what Im doing in OCAML and how it works?
  • Is the paradigm of one statement for line insufficient? Why is it insufficient?

Thanks for the help! Hope I can appreciate OCAML better and be an effective OCAML programmer.

1 Like

One possible answer is that OCaml doesn’t have statements. It just has expressions. So all the computation in OCaml is a series of nested function calls. This isn’t wrong, but it’s really just a partial answer.

Another (possibly better) answer is that there are expressions in OCaml whose value isn’t interesting. Those expressions can be considered to be statements as you would be used to from imperative programming.

Expressions whose value isn’t interesting in OCaml are (by convention) defined to return a value of type unit. In fact, there’s only one value of type unit, which looks like this: (). This is why the value isn’t interesting. You already know what the value is going to be before you evaluate the expression. The reason you’re evaluating the expression is that it does something besides return a value, i.e., it has a side effect such as printing a value, deleting a file, modifying a reference, and so on.

Examples of expressions of type unit are: while, for, :=, Printf.printf. These are in the imperative subpart of OCaml.

To code in OCaml you need to learn to think of your program as a series of function calls, some of which might have a side effect. You should also get comfortable working with immutable variables and values, i.e., values that you can’t modify.

Your description of a program as a series of things to do one after the other is exactly the imperative model of computation. In some ways, the functional model works at a higher level because it doesn’t concern itself (so much) with the order that things happen in. It just describes the functions that need to be applied to get the answer.

I hope this helps, at least somewhat.

7 Likes

There are quite a few ways to go around that. But one way of seeing this is that an OCaml program is an expression e.

Running the program consist in evaluating e until it gets to a value. That something you can no longer evaluate, e.g. 3 out of the program 1+2.

Now a statement is simply an expression that evaluates to the unit value () which has type unit (similar to void if you are familiar with C). These expressions are usually those that perform side effects like printf or variable assignments.

The semicolon e1; e2 you got confused by is simply a way to construct a new larger expression e out two expressions e1 and e2 with the constraint the first expression has to evaluate to () (that is have the type unit). Now to evaluate e to a value we first evaluates e1 to the value () and then proceeds to evaluate e2 to a value which defines the value to which e evaluates to.

So ; is really a sequencing operator to build a new expression out of two expressions, the first one which needs to evaluate to () and the second one giving the value of the compound expression.

4 Likes

Also you may want to checkout the book OCaml from the very beginning that explain these things (expressions, values, etc.) quite well in my opinion – in that case it’s even mostly in the free sample chapters online, and there are also a few videos.

1 Like

An expression is like what you find on the right side of a mathematical equation, eg. 5 + 7 * 2. It can be resolved to some value.

A let expression allows you to bind values as part of a larger expression, for example:

let a = 7 * 2 in
5 + a

Functions are also bound the same as other values:

let twice n = n * 2 in
let a = twice 7 in
5 + a

In this way, you can build whole programs which really amount to one expression to be resolved (like one big formula); comprised of many subexpressions, hopefully.

The in keyword hints at the scope of bound values – generally you’re building up parts of an answer toward some final result, so you define some things to be used in the following expression.

A statement can be identified by a semicolon. Printing, or output, being one of the most common cases. Imperative programming is done with statements, and you can do this too in OCaml, with mutable values, and for loops rather than recursion. OCaml is more naturally written in a mostly-functional style though, which ends up being expressions resolving down to results.

As has been said by others, working in the OCaml toplevel (REPL/interpreter) can be a bit misleading because it’s not as you’d typically write a program. It’s a useful space to test out functions or parts of programs, and yes to start learning… but writing programs you generally want to avoid defining everything at the global scope (as in the toplevel).

I think Daniel’s point is a good one: by building OCaml programs you build up a large expression (the program) from smaller expressions (the instructions) which all have a evaluate to a final value. Evaluating that value is conventionally called “running the program”.

In a way the ; is the antithesis to that, since the value of the evaluation of the expression does not really take part in the resulting value of the topmost expression. Then I was thinking of making an operator which instead of discarding the value would require a value of type unit and a value of type 'a and return said 'a. As you can see in the post, the resulting operator is just a glorified ;.

Since you mention Python, it is actually pretty annoying that so many things are statements, so you have to write:

foo = None
if bar:
   foo = bar
else
   foo = baz

In this code you need to define a “dummy” binding and make sure you set foo to the right value. If you only could do

foo = (if bar: bar else: baz)

And in fact the Python developers decided that this is useful indeed, so you can write

foo = bar if bar else baz

Which is basically the ternary operator, an expression. Unlike in Python, where it becomes unfeasible very quickly to write more complex conditions this way, OCaml is optimized to facilitate writing code in this way, even turning statements into special cases of expressions that return unit.

You can emulate am similar thing in Python by wrapping every statement in a function, so an expression would return the value and a statement function would return None.

(That’s not entirely the full truth, because not everything is first class in OCaml, since e.g. modules and types live “somewhere else” but it is a close-enough approximation that is useful for programming in OCaml)

In Python conditionals does not create a new scope. It is simply the same scope. But your point stands for almost every other language other language.

That is true, but necessitated by the fact that Python does not distinguish between binding declaration and assignment. So foo = bar can mean any number of things: declaring a variable in the current scope, setting a variable in the current scope, setting a variable in the outside scope, setting a global variable. This gives raise to all kinds of arcane incantations and warts to tell Python what to do: global, nonlocal, the infamous UnboundLocalError

Many other languages like OCaml or Scheme are very explicit about distinguishing assignment and declaration so these issues never occur. Python’s choice is more like the antithesis to Rich Hickeys “simple vs easy”. By doing the “simple” thing it makes it not “easy” to understand.

1 Like

I think you maybe meant the other way round? I.e. Python does the easy thing but that makes it not simple.