OCaml non orthogonal featrures (was Simplification of OCaml as a design goal?)

octachron · July 21, 2022, 8:47am

There is a distinct computer science course (https://prepas.org/index.php?document=70 and https://prepas.org/index.php?document=73) which is mandatory in the MPI curriculum and an option in the MP one.

Chet_Murthy · July 21, 2022, 8:55am

It’s a funny thing: once upon a time I would have read that syllabus and thought: “as it should be!” But after 27yr in industry, I think:

Where is the numerical computing? the simulation? probability and statistics on the computer ?

For anyone who enters any science other than compsci, sure, they’ve learned a level of programming that they might use every few months or years. But they’ve learned nothing of the most important use of computers in their field.

I’m guessing that there are other classes where physicists-to-be learn about these things.

octachron · July 21, 2022, 9:05am

My experience (some 15 years ago) is that physicists often have a one programming course (either in C or python) , have a few lectures on numerical mathematics for physics, and then learn to implement those algorithms on the job during their internship or Phd in whatever language that has been used previously in their laboratory. Typically, I have written physics collaborative code in Fortran, C++, Igor, Matlab, C, Mathematica, and Charm++.

bluddy · July 21, 2022, 11:00am

Most of the advantages of python and other popular languages come from having a strong ecosystem. In this sense, it’s a circular argument: a strong ecosystem results in more users, and more users allow for a stronger ecosystem.

At the same time, a dynamic language like python is very easy to use: you need no compilation and the syntax is so minimal, that non-computer scientists can easily grasp it. A clean syntax encourages experimentation, because the code becomes centered on the application rather than the syntactic peculiarities of the language.

Additionally, the strengths of strong typing tend not to bear out in small experiments and bits of code you write in university. They’re particularly helpful for long-term maintenance and large codebases.

mobileink · July 21, 2022, 12:20pm

That would do a disservice to your students. Tuples and records are different kinds of things. Tuples are ordered, records are not. The indices into a tuple are not field names. It would be a fundamental error to treat them as “numerical labels” of fields.

I don’t see how that is possible even in theory. Records and modules are not just different types, they live in different type systems. There is no way that I can see to use the same constructor syntax for both without erasing the difference.

Same objection.

Gregg

yawaramin · July 21, 2022, 1:24pm

I feel like the industry perspective is missing from this proposal. In industry the issue we face is maintenance of legacy (read: revenue-generating) codebases in the face of business pressure to constantly deliver new features and ecosystem pressure to constantly break old code via updates, security issues, etc.

Someone already mentioned Python 2/3 transition but I’ll also add the Scala 2/3 change where the language team made some rather large changes, deprecating some complex corners of the language. Industrial teams are now in the tough position of waiting for the ecosystem to catch up, waiting for bugs in the new compiler to be ironed out, and making a business case to justify spending time to upgrade.

If this becomes the OCaml philosophy (after 25 long years of backwards-compatibility) then it really seems there is no safe harbour for industrial developers!

Maelan · July 21, 2022, 1:47pm

I believe OCaml is one of the simplest and most regular languages I know, concept-wise. Even C has some oddities. Perhaps it’s because it’s so minimal we are so sensitive on the few protruding imperfections?

I would add to your list the fact that we have both <- and := for mutation, which I think is very troubling for beginners. Oh, and the ever-lasting complaint, that constructors are not 1st-class functions.

IIRC Scala did a move towards unifying modules/packages and objects. The result is by no means simpler. The thing is, even if you can find approximate algebraic similarities between both concepts, they are still distinct mental concepts, with different usage patterns. Consequently the similarity is partial, limited by practical differences, and it’s hard to come up with a merger that would reconcile them without being a hassle to use in both use cases.

For instance, if I read you correctly, you’re proposing to ditch the current typing mechanism of modules (nominal typing) in favor of that of records (field-based disambiguation). That implies you wouldn’t easily have two modules with identical field names, which seems very contrary to the current use of modules, right?

Then perhaps the issue is not with the shortcut per se, but with the specific English word that has been chosen as the keyword? I (as a non-native speaker) personally never found that of made much sense here, but well. During a 30-second brainstorming I came up with madeof, bearing, withpayload… not pretty but more explicit, I’d say. Or even from!

Back in the camlp4 times, there existed the Revised Syntax. It demonstrated that the language is independent from its concrete syntax, and that with camlp4 you could have a vastly different syntax. The author conceived what he believed was a better syntax (“simpler, more regular, more logical”) for OCaml, which seems to align with your ideal. Perhaps you could draw inspiration from it and come up with another alternative syntax. This allows to alleviate at least some of your complaints (e.g. of, type variables). You wouldn’t be the first person to do so. Nor even the 2nd one.

beajeanm · July 21, 2022, 1:48pm

If we are talking about the way the industry perceive major language update:
Java 9 has been perceived as burdensome to migrate to because of the introduction of the module system. (I think a lot of this is more fear than reality, but that’s another topic all together)
As a results, 5 years after the introduction of Java 9, 1.5 years after the EOL of the free support for Java 8, JDK 8 is still the most used Java version…

The industry, for good reasons, dislike breaking changes in a fundamental technology.

c-cube · July 21, 2022, 2:03pm

What you’re asking for, @craff, is an entirely different programming language, with flavors of 1ML (I think). The python2->3 jump was of far smaller reach that your ideas. I imagine it could be possible to bootstrap such a language by forking OCaml and hacking at the typechecker like crazy for months, but it might not even be doable in that case.

Trying to change OCaml that much will just break all existing programs (and tools) and still inherit some warts, so a lose-lose proposition. I think language evolution happens more often by the creation of new languages inspired from the old ones. What we need, perhaps, is a modern ML for the 2020s (hopefully one with value types, good int32/int64 support from the start, easier FFI, and typed effects from day1 — but see that’s just my wishlist).

gadmm · July 21, 2022, 2:05pm

Not just the industry, but people who maintain free software too, in my own past experience of working in a large long-standing project. Preventing “bit rot” already takes a good proportion of the time of thin volunteer teams, at the expense of new features and motivation. This was in a language where the ecosystem already had a strong backwards-compatibility culture (C++/Qt). There is some disconnect from reality in the “break it so that people are forced to catch up” approach.

jumpnbrownweasel · July 21, 2022, 6:24pm

I agree. I used C/C++/Java for many years but recently I’ve had time to learn functional programming and do some exploring. After trying projects in Haskell, OCaml and Rust, I’m strongly drawn to OCaml because of its simplicity and reliability.

I think the most difficult and complex type of programming is making use of multiple cores, while handling concurrency and errors correctly. On this OCaml has an opportunity to jump ahead due to the multicore project and the new effects system. That type of simplicity is much more important to me than the syntax issues discussed.

craff · July 22, 2022, 1:10am

This discussion is going to wide (and wild) (although interesting). What I wanted to discuss is now more precise in the title.

Here is the list of features in OCaml that are not orthgonal (meaning there is some common application). And
only the first one is currently an inclusion in the current version of OCaml. I may be missing a few.

ADT / GADT (GADT is strictly more general)
tuple / record / module / object : 4 forms of cartesian products without first class label.
tuple / record / module are already identical in the runtime, not object.
variant / polymorphic variant
array / bigarray (cartesian product with first class labels which are integer)
function / functor
'a and a type variables

Some people think this is not possible to unify some of them in theory at all. This is wrong look at subml/pml we did Rodolphe Lepigre and I, where the type system does unify a lot of the above. You can try subml on line. pml is mostly subml + proof of programs (If you try, I just saw a bug in the online subml: you have to run prelude.typ to define booleans before running some of the other example).

It is clear that having all this makes the implementation of OCaml bigger and its learning curve bigger and this
is not merely a syntactic problem.

The real question is: do we push the research/development of OCaml toward unifying at least some of the above
WITH CODE COMPATIBILITY ?

GADT and ADT are already unified in the implementation except if I missed something in ocaml code and the question is now only should we keep only one syntax? I think the declaration of constant constructor must be kept, but I don’t like very much the “of” keyword. Removing “of” (with deprecation first) or not must be discussed at this point because the unification is done and OCaml is ready for that. The answer is not obvious. The argument of student/beginner writing by mistake a GADT is a good point in favor of keeping it. This also raises the question of the way the manual is written, it could be shorter is only GADT was presented and the “of” keyword and constant constructor were presented as syntactic sugar.

For all other unification it is a question of how hard is it to unify two of the above feature with compatibility 99%~100% with existing code? I was not clear enough I WANT CODE COMPATIBILITY meaning that a
long period of deprecation should exist before removing the syntax for a feature that is now replaced by a more general one. In a lot of cases, the simplification does not need a syntax removal (like for ADT/GADT: we can keep the “of” keyword if we think it is better, but explain it as a syntactic sugar when constructor to not change the parameters). An identical feature with two syntax adapted to different situations is simpler than two different non orthogonal features.

1°) For record, if we consider that there is an implicit declaration of a family of record type

type ('a1,…,'an) r_n = { 1 : 'a1; … ; n : 'an } (* for 0 <= n *)

Then, all tuple (including unit) could be interpreted as record in the AST for expression and type. This would remove some code in OCaml’s compiler. Only ppx that treat only record or tuple would be broken (or those treating them differently), or ppx that do not use ast_helper to produce tuple type/expression. This probably mean 99% code compatibility. This unification therefore seem possible in the near future.

2°) we can think on how to use this unification for the best of the user. Typically, field with both numerical and alphabetical value have some nice applications with a notation like { x, y with color = Blue}. This probably only simplify OCaml implementation here, but not its syntax. We clearly want to keep two separate syntax. But teaching is simplified because we can present tuple as a syntactic sugar for record. This simplify futur manual, tutorial and books too.

The question of starting a brand new ML like language based on recent algorithm like the one of subml/pml
is not the same. A lot of people are doing that and I regret a bit not to have turned pml into a real language. But this would have required like 2 years, with a salary to live, nothing else to do and a few more people than just Rodolphe and me, in particular to work on runtime, GC and ecosystem (note: pml1 had an llvm compiler with Boehm GC that was working and had a few interesting feature like resolving closure to function call at compile type from typing, or optimisation of record layout to get constant time access and minimal space lost).

Cheers,
Christophe

Chet_Murthy · July 22, 2022, 1:24am

I don’t understand. If the code is compatible (by which I assume you mean that users can keep their programs unchanged) then who cares? Who could care, except the OCaml maintainers?

Type theory is cool stuff. Sure/sure/sure, I’m a lapsed priest of the Church of Curry-Howard, and I miss it pretty frequently. But unless you want to compete with Haskell for the category-theory knobs, most people simply don’t want to know about type theory. The fact that there are modules and records, is … simply not interesting to them and never will be.

I have a friend who is designing a language for numerical analysis. I keep pushing him to take some significant body of NA code (perhaps PySCF) and convert it to his language, as a way of understanding the challenges. B/c “we recoded matrix multiple” don’t cut it. I feel like I want to say the same thing here: [and giving a much-too-easy example] maybe you might want to take the 6.001 syllabus, code it all in Python, then code it all in OCaml, and see what the difference feels like?

That’s not really fair, b/c it’s pretty small and simple stuff. But it’s a start. A better thing would be to take some significant machine learning problem and recode that. So here’s an example:

take the fast.ai deep learning course (wherein you will learn Python to your heart’s … uh … content
then redo all the code in OCaml

See what that feels like.

You’re a type theorist, and I feel like, perhaps you’ve got a hammer and you’re looking for nails.

craff · July 22, 2022, 1:33am

First OCaml maintainers would gain time they can use for something else and this is very important.

Second, consider the much more difficult case of record = module + function = functor.

It you do both (which is hard clearly, currently ocaml compiler is too bad at infering type of modules), then we could add a syntax for adding a type declaration in record and a few more to bring
the full ast to both syntax. Then, the module/functor syntax would probably slowly disappear, as it is rather heavy. When OCaml has converged with a common syntax for record and module a few YEARS later; you can simplify the syntax (deprecate first, remove second).

But, the manual and teaching can be simplified immediatly after the two features have been unified. this is the important point and this is by doing so that one of two syntax would disappear.

Frederic_Loyer · July 22, 2022, 12:55pm

My son is in Classe préparatoire in the new MPII cursus (the first “I” is Informatique, Data Science in english). His main language is ocaml. (He also has some C lessons).

My point of view is that it is not easy to make it works in Windows : there is one environment discontinued since Aug 2021, and I wasn’t able to make lablgtk works with diskuv.

I guess also the whole SciPy And Matplotlib libraries are big added values in the Python ecosystem.

SpiceGuid · July 22, 2022, 3:50pm

And once we have 1ML (1ML - core and modules united) extended with recursive modules and all the modern stuff please don’t forget one important point :
● please, please, make the 1ML pure core as the new gallina/coq functional code

ancolie · July 23, 2022, 3:14pm

As said by @octachron, they’re both used in “prépa” buuuut… there’s a new “MP2I prépa” much more focused on CS with no Python but only OCaml, C and a little bit of SQL.

And IIRC, some schools (like ENS Rennes) already said they’ll only accept CS students from the “MP2I competition”.

It’s not disappearing. Strictly more students than before will learn OCaml.

jbeckford · July 23, 2022, 10:46pm

This problem has the easiest solution in the thread: file an issue. Especially file before the school year starts if a missing Windows package is blocking a class from adopting OCaml. File here or here for Diskuv.

So … to everybody who is teaching OCaml or has some influence on teaching: Please file your issue now! Can at least check whether there is a reasonable chance of getting a version cut that works for your students before your students stumble, or worse, prematurely abandon OCaml for the entire class.

bluddy · July 24, 2022, 5:47am

This is somewhat true for the entire thread. Giant, non backwards-compatible changes are not going to be made. It’s just not going to happen to a stable language. But small changes, like adding syntax for accessing members of a tuple without needing to destruct it? That could be suggested in an Issue on the OCaml github.

Chet_Murthy · July 24, 2022, 11:33pm

A funny thing: a friend called me this weekend to kvetch a little about how the new modularity stuff in Java 9 is barely used, even 5yr after it was released. And we got to talking about languages that took a bad course and later recovered from it. We could only come up with one example: C++. In the 90s, C++ was crazily complicated, to the point where nobody could really build significant systems in it. When Java came along, people ran from C++ to Java for this reason (among others, but on Wall Street, this was the big reason). Then with STL and Boost, C++ got back into a sane course, and today it’s actually progressing and become more and more usable. Of course, it’s a little late for that: other languages have taken up the slack. But today’s C++ is unrecognizably different from the C++ of the 90s, even though they’re the same language, b/c of the difference in the libraries.

But that’s the exception. Once a language’s course is set in stone, it’s really, really hard to change it. And for OCaml, that’s an excellent thing, in my opinion. I have code from 30+ year ago, that continues to work unchanged.

Topic		Replies	Views
Custom syntax sets Community	19	2758	September 2, 2017
OCaml's domains Community	27	3042	April 20, 2018
Application-specific Improvements to the Ecosystem Community	52	3278	August 12, 2022
OCaml for Data Science Ecosystem machine-learning , data-science , statistics	25	13097	May 3, 2018
My Thoughts on OCaml vs Haskell/Rust in 2023 Ecosystem blog	91	26798	September 13, 2023

OCaml non orthogonal featrures (was Simplification of OCaml as a design goal?)

Related topics