About the order of definitions

In OCaml, any dependencies must be defined above their usage.

I particularly enjoy this feature, because:

  • it forces me to be more organized / consistent
    • it encourages me to keep related code together sooner rather than later
    • when spiking, I notice I make less of a mess that way
  • if I don’t understand a piece of code, the only reasonable action is to “look up”, rather than scramble into multiple directions

I’ve been wondering for a while why this design decision was made? Was it implemented as a technical detail to make compilation related tasks easier to manage? Or was the user experience the main consideration from the get-go?

I have vague memories of encountering this behavior, once, when first playing with programming languages some 20-odd years ago, but I can’t recall which programming language it was.

Do you know of any other languages that display this behavior?

C (and C++) ? (and most other descendants of the ALGOL family)

This constraint makes both reasoning about and implementing the language simpler.

Cheers,
Nicolas

Oh that’s it!

Funny thing is, I’ve been playing with those languages for a few days now and I’ve made it such a habit to define things in order now that I didn’t notice.

I’ve also been playing with Rust, which does allow to define functions in random order (which I dislike).

I find it puzzling that most languages I’ve stumbled upon also allow this, I don’t see the advantage.


EDIT: I will add that in the case of C++, methods can be defined in a random order

Note that C has forward declarations, so you can define the body of a function after its use, provided that its prototype is already defined. There were a few times I wished I had this in OCaml (and in these rare cases, I used a reference to a dummy function updated later and this was good enough).

Interesting thanks!

So it looks like OCaml is rather unique here. Apart from F#, I don’t know of any other language enforcing that constraint.

FORTH is compiled and linked (or thread-compiled) in one pass. Then we must define something before using it. And there are no such things like prototype definition.

On the other extreme the M language (Excel/Power Query, Power BI) can even allow the usage of local constant* numbers defined afterward like in let a=b, b=42 in a. (*No variables in M, only constants… it is a pure functional language). It would be like if we can say in C { int a=b; int b=42; return a;}. In Racket, a scheme derivated, the equivalent (let ([a b] [b 42]) a) doesn’t work too. But Racket can do some late bindings with functions/lambda expression.

I think OCaml gets this trait along with the ‘single-pass’ parsing philosophy from Pascal and its Wirthian descendents: Forward declaration - Wikipedia

In Pascal and other Wirth programming languages, it is a general rule that all entities must be declared before use, and thus forward declaration is necessary for mutual recursion

Probably with an original Pascal implementations. But Borland Turbo Pascal introduced a forward definition comparable with the C prototype. Function B:Boolean;Forward; gives a function signature before use for example (the function must be defined afterward).

Forward declarations are standard Pascal. Present in Pascal since the first version. Otherwise one could not write mutually recursive functions.

1 Like

Ok, then Pascal is in the same category than C. Everything has to be defined before use. But we can define the prototype and afterward the implementation.

One use is mutual recursion, but Ocaml allows them with the and keyword.

let rec a x = …
and b x = …

But here a and b are parts of the same let definition.

Note that OCaml consider the following to be recursive and warns there is no rec keyword:

let a x = 42
and b x = a x

Then the and keyword is not a good thing to link multiple definitions which could use previous ones.

1 Like

For compilation, it’s nice to not have to worry about the order in which files are specified. But what’s the advantage to allowing random-order declarations within a source file?

The justification is usually ‘the compiler should allow me to express my code structure as top-down’ and ‘compilers should serve humans, not the other way round’.

2 Likes

If I remember correctly, the Standard ML spec says nothing about separate compilation of files, but it does specify the SML toplevel. The semantics of the Ocaml toplevel is mostly the same as SML allowing for the differences between the dialects. In the toplevel, the order of declarations obviously matters because the interpreter doesn’t know the future.

Once there’s a toplevel, it is natural to consider separately compiled files “as if” they were loaded into the toplevel, and so constrain their declarations the same way.

Just a guess.


Ian

1 Like

Unlike other languages, OCaml lets you shadow different definitions of the same name in the same module, so you could have:

let a = 1
let b = a + 1
let a = b + 1
let b = a + 1

Obviously this example is quite contrived, but being able to do things like this can be useful for the usual reasons why shadowing is useful. This, however, relies on new items’ definitions only being able to refer to previously-defined items.

6 Likes

Interesting language (!)

Thanks all for your input, I appreciate it :slight_smile:

Yes, FORTH is quite interesting. There are no “special” keywords like “let”, “if”, “+”… every words have the same status, then we can create our own “let” keyword (named “:” in FORTH) or “if”. Then there is the same flexibility than scheme/LISP with its macros. There is also a read-eval-loop interpretor.

However, it is a very low level language. No type verification, unsafe pointers, no heap in the first versions, and still optional. Pure reverse polish notation (hard to read!)

It is quite a clever design for a very limited environment, but I won’t choose it nowadays. I had some good time with the F83 implementation… (a FORTH-83 implemented in 1984).

Sun used to integrate a FORTH interpretor in their bootloader!

This is the heart of the reasoning, isn’t it? You want for a series of lets to be like they would in an expression – to be explicable using the lambda-calculus, so you must have a binding-order.

Also, the idea of “don’t care what the order of definitions is” seems to be descended from O-O, and from LISP. I know that Pascal had “define before use” very much up-front-and-center in the pedagogy.

2 Likes

It’s a bit of PITA to have to go to the bottom of a file and then go up step-by-step to understand the develop and understand a program.

Caml Light had a where keyword which allowed to put the main function first (the keyword (still) exists in Haskell). Such a keyword is quite nice for top-down program development (which fits functional programming perfectly) and also program re-reading.

1 Like

Ah yes. I like where sometimes, too. But that still fits into the lambda-calculus: after all, it’s still well-defined, what the scopes are, and what definitions are visible in which other definitions (and in a sense, that’s the heart of lambda-calculus, right? … or at least, of lexical scoping). It’s a mere matter of parsing, that where no longer is there. I don’t remember why it got removed, but I have some vague memory that it was a parsing-precedence issue (but I could really be misremembering here).

I’m sure there was some good parsing reason why it was removed.

Haskell’s hand is forced in this matter; since definitions must be equational, it must allow recursion by default, including mutual recursion. It’s too hard to write mutually recursive definitions without forward reference :wink:

1 Like