Serious OPAM package quality issues

It seems to me that there are 2 competing ideas, or perhaps requirements there:

  • the need for a well rounded set of libraries working seamlessly together
  • the need for being able to install libraries with as little pain as possible

My understanding is that OPAM so far tries to address the second point, not the first (maybe I’m wrong?). Apparently it’s not doing enough yet according to your friend, but my impression is that it would be worse without it - or rather, that the situation is much better with OPAM than without.

Also, the first problem is really a distribution goal, rather than a package management goal. Many Linux or BSD distros have been working on that second point for a long time, and often do provide a consistent set of libraries for people to install and use easily. So unless you’re looking for very exotic libraries, perhaps your friend should consider using a distribution/his distribution packages, and fall back to OPAM when the package he need isn’t supported there? Besides, OPAM accomodates very well with that setup.

I think this is selling Opam short. OCaml’s static typing in combination with a constraint solver to find compatible packages gives extraordinary strong guarantees compared to other package managers. The achilles heel of this are package dependencies without version constraints. When a package enters Opam it is checked to be working but dependencies are typically not fixed to the versions at that point in time and hence they can degrade.

3 Likes

I think this is selling Opam short.

I might very well be doing that. I admit with shame that I am not well versed in Opam. So… I just went quickly through the manual.

Thinking about it, I guess a possible course would be some sort of package configuration validation tool? Unless this already exists?

Sounds like a LTS edition of OCaml can help?

So all those applications can get on track and follow one version.

Then, casual releases of OCaml can get used by those who can switch.

Before anybody insist on the point, that this would lead to the situation, which motivates people to stay on outdated versions, while the casual releases get less attraction:

This is already the case, just without the benefits of a LTS.

I don’t believe so. We could use a combination of two methods:

  • Capture exact versions of dependencies in a separate lockfile (like npm/yarn do) and prefer to use those (because we know they worked before for the developer)

  • If exact versions can’t be used (perhaps because some other package has a conflicting version requirement), a package specifying dependencies under semantic versioning doesn’t actually need to specify what range of versions it accepts. We can automatically infer that it accepts the following range:

    • starting from every patch version lower than its specified version (since patch versions are forwards-compatible)
    • up until every minor version greater than its specified version (since minor versions are backwards-compatible).

Of course, we can also allow an override to pin a dependency at a specific version, which opam already does.

So concretely, if a package foo specifies a dependency bar at version 1.0.1, we can auto-infer the semantic compatibility range of bar as 1.0.0 to 1.major.minor (where major and minor represent the highest available major and minor versions).

With this scheme I think we can narrow down the scope of package maintainers’ responsiblity to correctly following semantic versioning … which mght be a whole different kettle of fish.

3 Likes

Here’s an opam issue proposing to allow package maintainers to explicitly specify dependencies following the logic I describe: https://github.com/ocaml/opam/issues/2976

This is probably a nicer approach than opam just unilaterally assuming semver.

Actually if we’re brainstorming, opam should be able to auto-enforce semver by hooking into OCaml’s built-in module comparison operations. Elm’s package manager does this to good effect: it can tell a package maintainer exactly what their new version number should be based on the API diff between two versions. But perhaps for OCaml, one thing at a time.

Enforcing compatibility of interfaces is not enough and could instead give a false sens of security (for instance, if the semantics of some functions change).

True. The calculated next version string should be a recommendation only and not a hard requirement. Elm-package also gives a recommendation, but with Elm the recommendation can be a bit stronger because it’s pure and the types give you stronger guarantees.

1 Like

Following semantic versioning in OCaml would lead to almost always stick to use major version numbers. For example if you add any new value to an API in OCaml you may break code; yet most people will not deem a release with such an addition a major release.

Hi Daniel, not 100% sure what you mean. Are you referring to if opam were to infer what the next version string should be by diffing module APIs? Because if I add a new value to an existing module in a backwards-incompatible way, I know to increment the major version number. But if I know (or at least believe after reasonable analysis of the code) that the change is backwards-compatible, I can choose to increment only the minor version number.

In the end, it’s my responsibility as a package maintainer to correctly follow semver. I may of course screw up from time to time but on the whole it should be worth it?

What Daniel means is that OCaml is more sensitive to ABI changes than SemVer conventionally expects. In C, it is safe to add a function to an ABI and bump the minor, but in OCaml this could lead to breakage of dependent libraries. It is quite difficult to follow SemVer in OCaml without constantly bumping the major version.

1 Like

Do you mean because of open usage and the possibility of name overlap?

AFAIR open is fine. It’s mainly the M.(exp) notation that is problematic one of your local variable names in exp may be captured by the new addition in the API (this is absolutely not theoretical, it happend with e.g. with cmdliner).

1 Like

Do you mean because of open usage and the possibility of name overlap?

Not only this. For example, if you add a new field to a module signature and it is used somewhere as a functor argument, then it is strictly an API breaking change. Given that we have module type of construct, any module can be potentially used for its signature somewhere.

Formally, if a new interface is a subtype of an old one, then we do not need to bump the major version. The problem is with arrows as always, i.e., with functors. Since module types may occur on both sides (being either an argument, or a resulting type) we may have the variancy problem. Basically, we can add more fields to an interfaces that is produced, but we may only remove fields from an interface that is consumed. (And can’t touch interfaces that are both consumed and produced).

6 Likes

Yeah this is a big problem, though I didn’t know we’ve had cases in the wild of it occurring. IMO it should be common practice to build up the modules you’re going to open locally before using them, as in

module F = struct let x = Foo.x let y = Foo.y end
...
let z = F.(x + y)

though I don’t know how well this would interact with Flambda. Of course, saying it is easier than doing it – the syntax is too heavy for me to even motivate myself to use it.

In general though, this particular problem is borne out of our lack of type-based dispatch, which causes the unusual need to open modules repeatedly.

I didn’t even consider the contravariance of functors as a problem. Yikes. I guess this is limited to functor-heavy code, which I’d venture most OCaml programmers don’t use, but it would become pervasive with modular implicits!

I guess this is limited to functor-heavy code, which I’d venture most OCaml programmers don’t use, but it would become pervasive with modular implicits!

The problem is that you never know how your library is used. Your library may not contain any functors and just provide a single module M. Now suppose that there is a user, that decided to parametrize his own library with the interfaces of yours, e.g.,

module type S = module type of M

module Make (Backend : S) : sig
   ... 
end

This essentially makes S a non-variant interface, as adding new fields to it will impose new requirements on Backends, and the downstream user needs to update their implementations. And removing a field will obviously break the API.

Thus my personal approach is to ignore the presence of the module type of construct. If a module type is explicit, i.e., declared as a part of the interface with the module type construct, then assume that it is non-variant and bump the version on any change. If the module interface is implicit (is not bound to a name) then I any subtype of it will have the same major version (i.e., I can add functions, data constructors, etc)

6 Likes

Apart from points on shadowing and functor arguments, I would consider it appropriate to keep the major version if I only add optional function arguments, with defaults consistent with the former behaviour. However, this breaks higher order usage, where the exact function type matters.

1 Like

I’d bump the major version for that actually, since the type changed…

Thought about this one for a while. To me it seems that doing (something like)

module Make(Backend : module type of M) = ...

should be considered a bad engineering practice, just the same way that doing

select *
from ...

is considered a bad practice. When we’re using a module as an input, we should be explicitly listing the members that we are using. Most people writing SQL (professionally) accept the corresponding discipline of writing an explicit column list. I think OCamlers can do the explicit signature too.

With that accepted, we can be one step closer to SemVer…

6 Likes

@yawaramin Do you know if there some kind of defacto style-guide that explicitly covers these nuances affecting elegant use of parametric modules?