In code generation tasks, particularly where one is working with mutually recursive definitions, an option for forward declaration would be nice. I usually end up trying to sort things in dependency order as much as possible, but inevitably I always end up with a big ball of let something ... and ... and
somewhere near the end.
Some other languages that require this order:
- Idris
- Clojure, but you can
declare
names at the top of the file, and then definitions of the declared names can appear in any order.
For Clojure, I’ve seen it said that the standard order speeds up the compiler a little bit, since there is only one pass through the file. I’ve also seen Clojure programmers defend it for reasons given above: you always know that the functions defined in terms of others are at the end.
Sometimes I follow the default order in Clojure, but sometimes I prefer to put the most important definitions first–which might be the ones that use all of the others–as @grayswandyr suggested. The reason is that the first definitions can then give someone reading the code an outline of the structure of the computations to follow. This can be very important to someone else who needs to understand your code for the first time (or for me a couple of years later).
When I don’t put the most important definitions first, sometimes I add a very prominent comment at the top, recommending that one begin reading with such and such function definition. If definitions are in dependency order, i.e. defined in terms of ones above, that most-informative function might not be at the end of the file.
In OCaml I tend to think of interface files as fulfilling the role of forward declarations and, when possible, declaring the more important items earlier.
I agree, code generation is a place where OCaml’s ordering constraints
can be a massive pain. If you’re generating code from some format where
order doesn’t matter, you end up having to implement a
strongly-connected component analysis where other languages, like rust,
can emit code in a straightforward way.
It’s also a pain when one wants to have many types that are mutually
recursive; and a real blocker if you want a type mutually recursive with
a set of it or something like that[^1].
It made sense for OCaml, and it’s consistent with having a REPL, but
languages tend to be less order-dependent these days.
[^]1 please don’t say “recursive modules”.
Isn’t mutual recursion by default a different matter from “postponed” local definitions (which is what where
is used for)?
Thanks for that detailed feedback
I’ll surely have a look at it, since it falls into the category of languages that “make you think differently”, so fun to play with.
To me, I feel the inverse is true. Reading the file top-down is akin to reading a book. You can only go as far as the knowledge you’ve actually acquired. If you skimmed to quickly, go back a step or two. Reaching the end, you “should” have understood everything.
Thanks! Interesting regarding Idris, since I understood it inherited quite a few Haskell idioms.
So I note that code generation can be painful (I have yet to play with this). In the other cases, I feel that restructuring one’s code may be an option to consider (feeling pain → time to refactor)
Basically all “scripting” languages? Python, Ruby, Clojure come to mind. Of course you could argue that they don’t do compilation, but e.g. in Python code does get compiled to bytecode so in many ways it is not that different from OCaml bytecode compilation.
You can even have fun errors about reading variables that you write to later, because the scoping rules are, well, not very good.
Even if this is true about Python globals (and I’m not sure it is so simple even in this case), because Python is OOP most code is in class methods, and those definitely can forward reference other methods.
Only in a statically typed language forum would someone call Clojure a “scripting language”! Clojure is definitely compiled (to JVM, to Javascript, a few other targets), even though it’s not statically type checked.
I go back and forth between dynamically typed and statically typed ways of thinking. The other side always looks different to me from the point of view of the one perspective.
Ruby, Python and JavaScript definitely don’t. For instance, executing this code from a file will run fine and print “OK”.
def func1
func2
end
def func2
"OK"
end
puts func2
But this is not unlike most (?) compiled languages as far as I know. As an example, this Go code will compile and run fine:
package main
import "fmt"
func main() {
fmt.Println(myFunc())
fmt.Println(myGlobalVar)
}
func myFunc() string {
return "OK"
}
var myGlobalVar = "OK2"
From what I understand, the thing that is resolved “statically” (during bytecode compilation) in Python is scope. So when the code is compiled, the bytecode compiler has to know in which scope an identifier is supposed to be. But it does not need to know the actual definition nor if it actually exists. For instance this is fine in Python:
def f():
g()
The bytecode compiler will happily generate some code for f
that will, when invoked, look in the global scope if g
is bound to something. So you can define g
above f
, below it or not at all. And if it’s not defined f
itself can alter the global scope:
def f():
f.__globals__['g'] = lambda: print("I exist now!")
# g() called here would throw an error 'g' is not defined
f()
g() #prints I exist now
What fun…
I don’t mean this to sound all pedantic and all, but … I think it might be worthwhile to think about this in terms of “what is the scope of a name?” Different languages make different choices about this, and sometimes they choose differently in different contexts. The question of “the visibility of names” is a … primordial one, in that it was heavily-discussed and -debated early on, but once these various modes of visiblity were established, language-designers would make these choices very, very, very early-on.
-
the original LISPs chose that a name was visible everywhere in the call-tree descending from its definition-site. This was called “dynamic scoping”.
-
later, languages like Scheme (and ML-family languages) chose that a name was visible in a structurally-defined textual scope: for local definitions, it was the body of a block within which the definition was made visible (“let … in …”) and for toplevel definitions, it was “the rest of the compilation unit”.
-
O-O languages chose that a name was visible within the “scope” where it was defined: a “scope” could be a class (for instance). And, I guess, from there it became natural for names to be visible within an entire compilation unit, treating the compilation unit as a a “scope”.
[obviously OCaml’s “objects” use the same scoping rules as O-O languages, but set that aside]
Notice how Python’s decisions about visiblity mean that when you define two functions in the toplevel, viz:
def foo(n):
return bar(n)
def bar(n):
return n
the definition of foo
is incomplete until bar
is defined. This is a (again) primordial issue: what is the supporting context needed to define the meaning of a bit of code? When names are visible within entire scopes, that context is the entire scope. When names are visible only on what follows in a compilation unit, that scope is … “everything prior in the compilation unit”.
Very interesting point, thanks. “Scoping rule” does seem to put a word on the behavior I’m describing.
So at the extreme end of the spectrum, dynamic scoping allows this kind of code to run just fine:
# Ruby
class Hello
def foo
"FOO"
end
def bar
i_am_bogus(1, 2, 3, 4, 5)
end
end
hello = Hello.new
x = hello.foo
puts(x)
// JavaScript
const Hello = {
foo: () => "FOO",
bar: () => i_am_bogus(1, 2, 3),
}
function i_am_bogus_too() {
return wat();
}
x = Hello.foo();
console.log(x);
At the other extreme we have languages like OCaml, Haskell, Go, Java, Clojure, etc. that implement “lexical scoping” rules. But OCaml, by allowing shadowing at the top level (as noted by @bcc32 - thank you), enforces a stricter constraint to the programmer.
I have yet to understand what the top level is exactly, but I think I get it now. Please feel free to correct me if I’m wrong.
Your two examples work for different reasons, maybe? [I could be wrong about this, obvs …]
-
in the case of Ruby, it works b/c you never invoke the undefined function: if you did, I assume it would blow up. That’s b/c Ruby isn’t even trying to ensure that all names resolve to well-defined targets.
-
in the case of JS, it’s staight-up “look in the rest of the scope for the name” (uh, I think).
The “scope” here is the entire compilation unit.
I do think that well-defined languages make clear, for a name, what will be the domain of search (and what order) for definitions of that name. That’s all I mean by “scope”.