Next priority for OCaml?

I assume you are familiar with the library linking RFC? Unfortunately it seems it has been left in a corner, but many people (including me) would love to see it go through,

1 Like

I am, and I think I’m familiar with the arguments against it, although I’m not entirely in agreement with them. (Note: there is an open pull request to update the RFC that has been stale since 2020, which just makes me sad.)

I think the proposal in the RFC is better than doing nothing, but I suspect that better than that is possible if people with a better mind for theory than I have were to approach the problem systematically.

But this thread isn’t about proposing a theory, it’s about discussing the alternatives for next priorities, and my opinion is that this is the one I most want to be taken up.

1 Like

Do you have any pointers to those? I looked at the RFC, but didn’t see any pointers to arguments against. Full disclosure: I also wondered why this got bogged-down: it seemed relatively straightforward, after all.

@dbuenzli probably remembers better than I do. I think I saw the discussion about it here on the Discourse server. My recollection is that it’s one of those things where the question is basically “I see that your proposal should work in practice, but how does it hold up in theory?” (And the answer to that question is, I gather, “nobody is really trying to find out.”)

Do you have any pointers to those?

A lot of discussion already happened here on the proposal and may shed light on other people’s objections.

Personally, I had some hand-wavy philosophical objections based on the Unix-ey philosophy that the compiler should do one thing and do it well rather than getting into the business of tracking package dependencies.

But since then, I have read the RFC several times and I’m getting convinced that this has been thought through well and could be a valuable change - particularly for people who use the repl a lot - maybe I will become one of those people if this goes through.

I could not come up with any concrete examples of where this additional responsibility of the compiler constrains its flexibility within the limited use-cases that I have.

Only trivial things stick out to me, but don’t seem like they’re worth worrying about:

  • If my program depends on foo.cmxa (a third-party library I can’t change), which in turn says it depends on bar.cmxa, I cannot simply replace bar.cmxa with baz.cmxa in my build rules, even if it offers the same functionality, just because the name is stored in lib_requires : string list of foo.cmxa - not a big deal, I can just mv baz.cmxa bar.cmxa
  • Similarly, in the above situation, I cannot instead of bar.cmxa provide a re-packaged split of the modules in say baz1.cmxa and baz2.cmxa as a replacement - again, you can probably work around this by naming one of the two baz’s to bar and using the other as an additional -requires
  • This business of -requires LIB adding the location of LIB to both the includes (compilation) and linking phases seems like it makes the expected physical layout a little more rigid, e.g. if the directory for foo.cmxa contained x.mli that I wanted hidden for compilation purposes, I kind of explicitly have to make sure they’re not co-located - but this is something that (probably) ocamlfind already does, and dune also, so is in practice not a big deal either

That is one of the easier criticisms to rebut, in my view. A fine example of a Unix-y compiler toolchain that has, for a very very long time, offered a functionally similar feature is the Apple C/C++/Obj-C compiler, which inherited Frameworks from the NeXT variant of Unix™.

I think the other criticisms about package evolution, which you point out, are stronger, and that’s why I’m sympathetic to the view that the problem needs the attention of somebody with a better mind for the theory than me. My hunch is that something that introduces a constraint language for package resolution directly into the language may be warranted, but this question is probably beyond my pay grade.

Nevertheless, if you ask me what “the next priority for OCaml” should be, then solving this problem is my answer.

Oh there’s even a funnier and shorter answer to that. Under the form of a question: Why does OCaml do for C libraries what it refuses to do for its own libraries ?

I suspect this is a bad idea.

You’ll most likely ending up introducing an NP-complete problem in the resolution language which will make it harder to get reproducible builds.

See for example here for a discussion about that. At the time I quite enjoyed at the series about package management here which was trying to devise a system for high-fidelity builds without lock files. I don’t know what it got to in end though, the constraints are hard. But I quite subscribe to his feeling that maybe something has gone wrong with the way we develop software if we need a SAT solver to define the outcomes of our builds.

So personally I rather have that problem pushed to the language agnostic package management layer rather than import it in the language.

2 Likes

You do make a good argument here for that.

Comby is a great tool for this, it’s even written in OCaml! I’ve used it with other languages for this kind of refactoring and it works just as you described, I think your example would be comby 'SomeLib.somefun :[bar] :[baz]' 'SomeLib.somefun :[bar] (SomeLib.convert :[baz])' .ml.

Oddly it doesn’t list OCaml in the language list on the website but there is a parser for it in the source code, so I would expect it to work.

3 Likes

Thanks for the reference. I’ll get in touch with the author to see if they would be interested in actually supporting OCaml.

From a quick look at the documentation, it seems that the tool works purely syntactically. This approach is of limited usefulness in OCaml due to the presence of open, include, module bindings, etc which necessitate some amount of typing information to fully resolve an identifier.

For the record, at LexiFi we have written quite a number of refactoring tools in this style and have found them to be really useful to do large-scale refactorings in our codebase. But these are all one-off tools written with a specific change in mind. In my view, what is missing is a specification language for those kinds of transformations that make sense for OCaml so that more general-purpose tools can be written.

Incidentally, we also have a “structural” grep tool where you can search for expressions with placeholders, for example List.length __ = 0. These expressions can even be annotated with type constrains to restrict the matching set: __ (__ : float) (__ : int) matches all function calls whose first two arguments are of type float and int. Tools of this type are also quite useful. (A technical challenge is that they use .cmi and/or .cmt files to function, and this typically requires some collaboration from the build system.)

Cheers,
Nicolas

6 Likes

Is this tool available anywhere? I’d be interested in trying to use it for refactoring/linting inside ocamllsp.

Comby gives me semgrep vibes, although I admit I haven’t looked too much into either. Are they related though?

No, it is not open-source.

Open-sourcing this tool would firstly necessitate a number of improvements (as long as it is only used within our walls, we can afford to make a rather large number of assumptions about its environment). Second, the implementation is partial in parts, has some rough edges, corner cases, etc, which would need further work for the tool to be useful to the larger public.

Cheers,
Nicolas

Latest team I heard trying to do this is grit.io. Really early team and only works for js/ts/python currently iirc. But I’d imagine they’d want to allow a plug-in system one day. We shall see.

Edit: Split this out into its own thread, actually:

Ooof. Syntactic refactoring. I spent some time on this.

I briefly looked at Comby way-back-when, before diving neck-deep into Semgrep; but I didn’t really get a good feel for it.

Semgrep, though, is one of the most nightmarish codebases I’ve ever spent significant time in. A lot of that isn’t due to lack of engineering skill, but is due to the ambitious scope, and an unprincipled (… but don’t listen to me wtf do I know …) initial design; they’ve got a informally-specified “general AST” that attempts to abstract every imaginable programming language AST, but by dint of a thousand language-specific hacks spread across the codebase … it’s just … whewf, what a collection of temporary-workarounds-made-foundational …

(I don’t think it’s a scalable approach, and I feel it’s already sort of hit its limit — I’m not the smartest engineer, but my own attempts to patch support for significant OCaml features ended up stonewalling. Idk. Maybe folks smarter than I find it maintainable and navigable, but …)

I do feel like a more-principled tool in this field could be extremely valuable, but I think this problem is repeatedly underestimated by every language-community that I see undertake a solution; and those solutions nearly always end up being rickety and sub-par for precisely that reason.

I haven’t looked into Comby’s implementation, but if they’re doing more up-front design work around a generic architecture, consider me mildly excited? Again, I’d love to see a success in that space, after so many things that look suspiciously (to me) like failures.

(Unfortunately, it also looks like development has slowed down — my biased suspicion is that it might be for exactly the same reason as Semgrep’s pivot away from mutation/soundness …)

@jfeser kinda nailed this one — the only promising approach I’ve seen to this problem is something like ROTOR; and that style of tool is language-specific, and tends to even been task-specific (see specific refactoring tools in IDEs for other languages.)

I thought this was interesting in the context of possibly adding modular implicits.

I think the point is not to overuse ad-hoc polymorphism, since it can cause the sort of semantic typing errors they describe. Currently, OCaml is mostly immune from this problem. :^)

5 Likes

I’m reading this post now, but … is the thesis of this post even in dispute? I mean, ad-hoc polymorphism is known to be problematic in this way! There’s a reason that the Google C++ Style Guide forbade “C++ template metaprogramming”, even back in 2013 (probably much earlier). It’s obvious stuff. It’s always been the case that the authors of traits and trait implementations need to be really, really careful. So what? Well-written trait implementations make Rust code much more compact and transparent than otherwise.

ETA: OK, I finished reading the post, and I stand by what I wrote. But also: he’s specifically using an example of business-logic, not data-structures. This goes to the point that the use of ad-hoc polymorphism (aka “template metaprogramming”) needs to be heavily-controlled and used only for core abstractions, not for business logic. But also, he gives one example? That’s kinda weak tea.

And furthermore, we can look at things like Rust’s ndarray and sparse matrix support, and it’s painfully clear that that code is wildly more usable and transparent than code that does the same thing in OCaml.

1 Like

It’s stuff like this that makes me wonder if I’m secretly a complete imbecile. My reading of the blog post boils down to this: if you put A in a type class that supports length (which by definition means you have a length implementation that supports A), and then you change (“refactor”?) the definition of A but you do not change the definition of length that applies to A, well, your code breaks. What am I missing?