OCaml non orthogonal featrures (was Simplification of OCaml as a design goal?)

I wish we could start to discuss the lack of simplification in OCaml’s evolution overall.
This was trigered by an answer to a post of mini on ocaml issue parsing bug on function types in variant type definitions · Issue #11445 · ocaml/ocaml · GitHub
when I proposed to remove the “of” syntax in constructor declaration:

No, we don’t deprecate simpler, perfectly-fine syntax used all over the place
(programs, docs, textbook, wikipedia, whatever) just to work around a parsing
ambiguity. The cost of those deprecations is huge and in this case the benfit
is razor thin.

I found this comment too much affirmative and here is why:

First, I think that the “of” syntax is shorter but definitevely not simpler
when I teach. I actually teach directly GADT syntax nowdays, except for
constant constructors.

As an experimented OCaml user, that followed Caml since the Heavy Caml (prior
to camllight) I have no problem with keeping both syntax. And my problem here is not
at all the parsing ambiguity even if it would solve that problem too.

However, this really impact the learning curve of OCaml! So merging 2 features or
deprecating one when we know that one feature can be replaced by the second
should be a basic principle of programming language design. A deprecated
warning can be deactivated and a lot of time may be left for adapting the
existing code/books/tutorial.

Ocaml is disappearing from teaching in France (to the benefit of untyped
python, which is really painful for me). One of the reason is probably the
lack of simplifications in the evolution of OCaml language.

Here are some possibilities of simplifications that I think should be considered:

  • ADT should be replaced by GADT
  • type a. a → a should replace 'a → 'a everywhere, the fact that we have now
    two kind of type variables is really problematic
  • tuple should evolve toward a syntactic sugar for record with numerical labels.
  • record should evolve toward a notation for first class modules (this is probably
    very hard now, the typing of module is probably still too restrictive).
  • in the same very hard idea, function and functor should be merged in the future.
  • module and object should also converge (again very hard because field access in object
    is currently dynamical contrary to module field, so there is a choice between resolving
    all method call for object at compile time or having more dynamical module fields and resolving
    some field access but not all at compile time).
  • array and bigarray should converge at some point.
  • etc …

All this is complicated to keep as much as possible compatibility with existing code,
require a lot of work and need careful thinking. This kind of work is often not
directly beneficial in terms of academic carrier nor industrial applications,
It is even negative for industrial application at first because you must spend time
to adapt your code. But I think the survival of OCaml lies (in parts) there.

My two scents,
Christophe

6 Likes

while I would personally love seeing some of these things you listed make their way into ocaml, I also worry that changing the language too much would crush us under the momentum of legacy code. We could end up in a perl5/6 python2/3 situation where library writers and industrial users simply refuse to move to the next release.

How many ofs do you think would need to be changed across the entire ecosystem if it was removed from the syntax in ocaml5.x or 6? 140k instances at least.

I can’t speak to the evolution of teaching in France, but maybe as a counterpoint, didn’t MIT also change their teaching of SICP from scheme to python? (and also sicp-js?).

Maybe CS education, for better or for worse (in my opinion worse), is shifting to prioritise “practicality” and choosing programming languages that students are more likely to use in industry (python, javascript etc.)?

1 Like

This is spot-on. I was there in … 2007 when Jerry Sussman was making the switch. And he did it (from Scheme, famous for having a simple syntax) because Python was popular. And for NO OTHER REASON.

It is what it is. Even at MIT. Even at MIT.

7 Likes

I think some of the changes that would deprecate a feature could be made with no hassle with a policy like:

  • when a deprecated feature triggers less than X% of warning an all opam packacke that have been updated to
    the lastest ocaml (meaning the .opam allows for using latest ocaml), then the deprecated feature is removed at the next release. X could be 1% or 5% ?

This would allow people to see how the deprecated feature is progressively removed from existing code. We could have a web-page listing all deprecated features with the current percentage on opam? Does such a page exists?

Some other simplification could follow the following path:

1°) Merging: work (research) to allow one feature to replace another with 99% of available code accepting the change. I guess bigarray could provide one type of array fully forward compatible with the existing ocaml array? Module could have a strictly more permissive typing than record? The only problem here is to make sure research/development is done in that direction.

2°) when 1°) is done, you merge both feature, but keeping both of them available at the syntax level.

3°) Once you have merged such feature you often gain syntax freedom, like using braces for structure, or the current syntax for array available for bigarray. then you can deprecate the previous syntax.

2°) and 3°) can happens at the same time.

4°) you apply the above rule to remove the deprecated feature.

0°) you do not require simplifications to be synchonized (this was a problem with python 2=>python 3) each feature
appear when the “unification” is ready, not before not later.

There is a different possible reason: perhaps it is because OCaml simply doesn’t allow the kind of easy use of so many of these data science libraries and numerical computing libraries, that Python permits and encourages. A while ago I looked at what it took to write numerical analysis code in OCaml, and it’s nowhere near as … fluid as in Python. And I hate Python, I hate Python with intensity.

This is part of what drives me to think that OCaml needs to embrace modular implicits, and in the runup to that, needs to experiment with how data science and numerical computing can be made as effortless as in Python.

Another thing: I work with a ton of Python these days, in quantum computing. A ton. The language is terrible, and yet the physicists, chemists, and other scientists continue to use it. It’s not a mere question of syntax: I mean, look at Rust, which is bristling with painful syntax issues and type-checking issues, and yet is rising in popularity.

I don’t think syntax is the issue.

10 Likes

A suggestion: rather than waiting for the deprecated feature to only be used in 1-5% of packages, why not send PRs to fix those packages that are available on github – proactively clean the feature out of the opam-package corpus ? Then when the feature is literally used in 0% of opam packages, the case for removing it becomes much more persuasive.

P.S. yes, it’s more work. But it’s also a more reliable and sure way to move forward.

I did not say speak about syntax, the problem mentioned here (which certainly is not the only one to explain why python replaced caml in education in France) is the number of non orthogonal feature and the lack of simplicity that results. Orthogonality of all feature is a design goal of python.

Here is a more concrete illustration: structure could replace record if

  • mutable value where added to module
  • some progress was made to infer the module type when packing and unpacking, typically reusing
    the code to infer record type.
  • a problematic thing below marked in a comment

Here an example of the translation:

(* type 'a r = { x : 'a; mutable y : string }
   could be compiled to:
 *)
module type R =
  sig
    type t
    val x : t
    val y : string ref
  end
type 'a r = (module R with type t = 'a)

(* record construction { x = 2; y = "toto" }
   can be compiled to the following,
   if we use the same algorithm that guesses type
   of record from field name when packing a module.
   this would probably give a much better packing
   of module *)
let m =
  let module R = struct
      type t = int (* problematic: fill this from type inference ?*)
      let x = 2
      let y = ref "toto"
    end
  in
  ((module R) : int r)

(* field access work if again we try to guess
   the module type. *)
let f r =
  let module R = (val r : R with type t = int) in R.x

(* two tests for with as include *)
let f r =
  let module R = (val r : R with type t = int) in
  let module R' =
    struct
      include R
      type t = float (* problematic idem *)
      let x = float_of_int R.x
    end
  in
  ((module R') : float r)

module type R2 =
  sig
    type t
    val x : t
    val y : string
  end
type 'a r2 = (module R2 with type t = 'a)

let g r =
  let module R = (val r : R with type t = int) in
  let module R' =
    struct
      include R
      let y = !R.y
    end
  in
  ((module R') : int r2)

Perhaps it is my long experience with parametric polymorphism, but … this proposal doesn’t seem to in the direction of greater simplicity.

Something else too: in these classes in France, is the Python that is used the pure language, or is it Python with significant C/C++ libraries? E.g. in quantum computing, there’s numpy, scipy, pyscf, and I could probably think of a few more, that are pretty much necessary to do anything even mildly nontrivial. All of these libraries, in OCaml, have pretty complex interfaces. [I’m sorry, I’ve looked at Owl, and it’s just miles and miles harder-to-use than scipy, and again, I’m a massive OCaml bigot and hate Python]

Any amount of simplifying OCaml the language, won’t change things if the critical libraries are harder-to-use than in Python.

2 Likes

May be I was not clear: my proposal is to implement record as the above in a way that would not change anything (hopefully) for the user. (I updated a bit my code, because I spotted a third difficulty).

The simplication would come later when the unification of record and structure would mean people would gain the
power of module within record (with type and module as record field), and may be much later the disparition
of structure. I was only examplifying here the step 1°) I mention when merging two features.

I should have said in my first response to your post, that I’m glad you’re bringing up these issues. I disagree with your diagnosis, but there is definitely a problem, and OCaml is risking being relegated to some niche, theoretical language for unrealistic people.

Have you looked at Rust? If not, I think you should. I think it’s interesting that there’s so much more … activity around Rust, than around OCaml. Christophe, you know that I’ve been an OCaml bigot almost as long as you’ve been using Caml. So when I say that Rust is going to be important, I’m not saying it lightly.

I am quite surprised by this pedagogical choice. I am not expecting a beginner lecture that teaches ADT to cover GADTs. If you start with the GADT syntax, you give more room for student to make mistakes, for the sake of features that they will not use for a long time. In general, I rather think that removing the simple ADT syntax would be an increase in complexity for the language, because the simpler notion that covers 99% of the use case would not be as easily expressible any more. Do you have data that shows that the GADT syntax is simpler for students?

I disagree that this a good design principle in general. Merging two features by removing the simple feature and keeping only the complex one increases the global complexity of the language both by removing scaffolding for people learning the language and by making it harder to not use complex features. Writing code using features as simple as possible is good program design.

I agree that this is regrettable that unification variable ended up using prime syntax estate when locally abstract type is probably the more used feature. But this is more a sign that in an ideal world we would have a more complex syntax for unification type variable and a simpler one for universally quantified rigid type variable.

Sorry, I am not sure that I follow how this a simplification? In particular, there would be many record types with numerical labels whereas there is only one n-tupe type.

That sounds like an incredible increase of complexity: you are replacing the core language by a fully dependent language.

Bigarray were introduced to have more efficient memory layout than array at the cost of only being able to contain type with known static size. There is work to have unboxed type and an array kind with possibly unboxed array. But this require a quite considerable increase in complexity (due to the apparition of memory layout kind and all the consequence of ending with different kind of polymorphic functions).

I think this is mischaracterization: a work that would result in a simplification while keeping the same expressiveness, usability, learnability and backward compatibility sounds like a great achievement for both academics or industrial developers.However, achieving a win on all four dimensions is hard which means that compromise must be made. Overall, I have the impression that you favor simplification for itself at the cost of all those aspects which is indeed not a design goal of OCaml.

Do you have statistics on this subject? My personal impression is that it would be much more damageable for the use of OCaml in education to break backward compatibility more often than having some awkward syntactic corner. After all, the core of the OCaml language has been stable for more than twenty years and there are pedagogical resource that are as old as that whereas the number of available resource for learning OCaml is still quite limited. I am not sure that it would help to obsolete teaching resource at a faster rate.

10 Likes

And this did not prevent MIT from adopting Python for 6.001, either. And something else: MIT adopted a language that, while syntactically simple, is semantically incredibly complex. It is a language with meta-programming built-in (Look at the boto3 library for example) and where a vast number of the semantic objects are coded on-top-of hashmaps (“dictionaries”). Just crazily complex, the minute you open up the cover and start looking inside.

2 Likes

My student had a hard time understanding the “of” keyword. They understand better it I give
the type of the constructor. You have to explain the “of” by saying it means “accept arguments OF type”, which is
longuer than “:” which means “of type”. Basically you remove one keyword from the language which is a simplification.

I never had a student writing by mistake a GADT when they wanted an ADT, especially because the mostly write
non parametric type.

I only propose the merge when the complex feature can replace the old simpler feature without changing the existing code, or very little. For record I meant: first you make it possible to compile record as first class module in a fully 100% transparent way, then you may consider using record syntax for modules, which would be a simplification.

Sorry, this need polymorphic record without declaration … we do not have that in OCaml. Sometime
I forgot that OCaml is not subml/pml. So OCaml is definitively not ready for that.

I agree but it would be nice to have only one kind of array in the end?

Classe préparatoire used to teach pascal or camllight. They moved to camllight or python and
now only python. Classe préparatoire is a major thing in France. As a consequence, the new computer
science program in Lycée only use python. If classe préparatoire had chosen OCaml I guess the lycée would have done the same choice, but this is pure speculation.

Another reason for the pedagical choice of python versus caml, is that with untyped language (I mean language where you don’t read or write type) you do not have to teach what a type is :wink:

And yet, no amount of “simplification” of OCaml will change this painful fact.

An observation about all these dynamic languages (Python, Ruby, Perl): when writing nontrivial code in any of them, you have to write tests – a ton of tests – to get the same result as the static type-checker and compiler get you. This is a well-acknowledged syndrome of those languages, and the books on “test-driven development” acknowledge this explicitly – I’m not saying something non-standard here.

My point is, OCaml chose to be a certain way, that disadvantages rapid ad-hoc use. For instance, when I need a relatively simple program but I need it quickly I reach for Perl, and never for OCaml.

And your argument would seem to indicate that what really needs to happen (sorry, I’m going to beat that horse until it collapses of the punishment :wink: is to get rid of as much type-related stuff (e.g. module-names, -expressions) as possible. I’ve really been impressed by how well that works in Rust, with traits.

1 Like

As far as I can see, only the common software engineering lecture is using Python nowadays? As far as I remember, this software engineering lecture has never been in OCaml (it was in Mapple during my time)? The computer science option (which was the one using either pascal or caml light in the past) is currently using both OCaml and C in MP or MPI.

1 Like

classe [quote=“octachron, post:17, topic:10210, full:true”]

As far as I can see, only the common software engineering lecture is using Python nowadays? As far as I remember, this software engineering lecture has never been in OCaml (it was in Mapple during my time)? The computer science option (which was the one using either pascal or caml light in the past) is currently using both OCaml and C in MP or MPI.
[/quote]

Yes camllight, never OCaml and I forgot that maple was also there.

The “programme des classe préparatoire” only mention python now for all options:

What option informatique do you mean ?

I read that link via Google’s translation (b/c my French is no longer good enough to do so quickly, sigh). Some thoughts:

  1. this course is not aimed at computer scientists, but rather at engineers in industry more generally.

  2. the goal is to train them to be able to converse with computer scientists, sure

  3. but also to be able to use computers to solve problems in industry

  4. And for that, Python is actually more useful than OCaml. This is fact! I don’t have to like it, but it’s TRUE.

Change that fact, and you’ll change the incentives of Thales.

In data science, in any field of science that uses numerical computing (== “every other field of science besides computer science”), in artificial intelligence, Python is vastly more powerful and applicable.

It didn’t have to be this way. But it is.

Some more: the entire second semester would be vastly more difficult to teach in OCaml, than in Python. Neither in numerical computing, nor in transaction-processing (== “programming with databases”) is OCaml superior to Python. Again: simple fact.