Defining standard OCaml development lifecycle processes

@cdaringe this is extremely nice work, thank you! I’ve contacted you directly about moving it into a more collaborative document to help refine it together across the various platform maintenance teams.

I just wanted to quickly clarify this:

The development process for opam is driven by a simple principle that is inspired by OCaml itself: don’t needlessly break backwards compatibility without good reason, and when it is necessary, justify it. Our tools are embedded in projects that have lifespans measured in the decades, and we take compatibility seriously. That’s why we take pains to provide migration paths from (e.g.) opam 1.2 to 2.0, from 2.0 to 2.1 that are as invisible as possible, and metadata is versioned in both opam and dune.

This is entirely compatible with the need for graceful evolution, since there is a clean separation of metadata (opam files) from the clients that operate over it (the opam executable). I’ve made it clear over the years that we welcome new experiments in the realm of clients that operate over metadata. Where I push back is when metadata is duplicated, as it’s extremely hard to reconcile metadata evolution in thousands of downstream projects. That’s why (for example) the new ocaml-ci is “zero configuration” as it can operate sensible purely by analysing the opam and dune metadata for a project. To enter the OCaml/opam ecosystem, your project therefore must at least have an opam metadata file, since that is what defines what it means to be in the opam-repository (the relationship to other packages, and instructions on where to get yours).

For clients, new workflows such as esy, drom and the VSCode plugin are the perfect experimental vehicle to inform future opam client development. For example, one lesson from VSCode is that a prime source of confusion for users is that “opam install” doesn’t edit the dune and opam metadata! Addressing this requires rethinking the entire model of the client: @dra27 and I’ve been sketching out an alternative opam client design that is more declarative than the current imperative one. Instead of having any global state, an invocation of nopam would simply get the local environment to the state defined in the local .opam file. In this scenario, to install a package a user just has to edit the .opam file to add the dependency, and then an invocation of nopam would result in it being available.

Many of the discussions on GitHub issues on the opam development trackers reflect our immediate priorities – to maintain a stable and evolutionary release process for the opam 2.x series with the minimum of drama for our users upgrading. Please do not confuse this for a lack of desire to innovate – we absolutely welcome such inputs, and will factor that into the opam 3.x development plans. The only thing we’re very careful of is needless backwards compatibility breakage, which is why changes to the opam metadata format are so carefully scrutinised.

I’m overall delighted with the level of innovation happening around opam these days; lots of plugins emerging, new solvers, more analysis and so forth. Keep it coming! :slight_smile:

14 Likes

Apologies for the tangent, but:

I can’t help but think that the current model of modifying the data and running some code on it is more “functional” in nature than a model where one runs some code and has it modify the backing data itself. :slight_smile:

Hang on, aren’t we in agreement? The current model is to run some code imperatively (opam install) and then modify the data to keep up (edit the opam file). A more functional approach would be to modify the data (‘edit the opam file’) and then run code to adjust the environment state (‘nopam’).

There’s ample room for both approaches though. Which one is optimal really depends on what the user of the tool is trying to achieve, which is different if you’re a distro maintainer, a library author, or a CI system or an end user.

Edit: To clarify my original message, what users ask for is that opam install modifies the opam metadata, but that would make the overall tool more complex. As maintainers, we’re trying to step back and solve their actual problem with a cohesive client design that’s more declarative. It’s tricky to modify an existing established CLI workflow without destroying existing good properties and usecases. Hence the motivation for new clients with new execution models that solve the user workflow problems.

1 Like

Maybe, as usual, I’m just odd. I don’t run opam install and then edit foo.opam, I edit foo.opam and then run opam install ./foo.opam --deps-only. I didn’t realize that workflow is abnormal. :slight_smile:

I guess that a surface workflow that would work smoothly could use a command that edits the metadata to add a dependency (rather than using an editor and possibly getting the syntax of foo.opam wrong), and then runs opam install to sync up the state of the intended switch.

2 Likes

Hi, is drom meant to be an experimental tool, to be supplemented by nopam? And is nopam intended to become opam 3? Trying to understand the evolution.

I think this what dryunit was supposed to provide.
I don’t think it gained momentum, and it’s still stuck in the jbuilder era. A shame really because it’s a nice idea, a single dune configuration that can be generated based on conventions and enables all *test.ml to be picked up.

2 Likes

drom is a tool by @lefessan. nopam is a codename I just made up to illustrate the difference between current opam and a hypothetical new client.

None of these are opam 3. When we flush through our opam 2.x stack (notably Windows support and other feature specs), we’ll publish a roadmap for opam 3. My point was that we encourage experimentation outside the critical path of opam releases, and the opam dev team will gather and internalise all the data we have available when it comes to setting the direction for opam 3 and onwards. If you do experiment, and you do post here, your efforts will not be forgotten.

I take no position on your normality, @jjb :wink: The only problem with the “edit opam first” workflow is simply not having feedback on whether or not the solution of packages and dependencies actually works for you. For instance, if I edit the opam file to depend on a package that conflicts with a current one (or introduces a dependency cone I dont like, or something else), then the solver needs to run to show that to me somehow. That works today since opam install shows you that action graph, but something else needs to be receiving these requests with an alternative client.

Many of our platform tools are adding RPC support at present due to this need for more interactive feedback with modern IDEs – dune, ocamlformat and merlin all have that now, and you can already observe the benefits with merlin directly talking to dune for example. It may make sense for ocaml-lsp-server to become the unified process behind which all the other tools sit, and for a CLI tool to also communicate with a daemonised process tied to a project (just as VSCode does today).

4 Likes

Just adding my 2 cents, I don’t have easy solutions or anything, but:

I also have the impression that current tooling is a very difficult pain point for newcomers to overcome. If I were to try another language today and had to learn about dune and opam (and their relatively intricate syntax and features, although dune does much better imho) and the decorrelation between modules names, file names, directory structures, etc. I’d probably ragequit quite quickly. On the other hand, rust, arguably a more difficult language to learn, has a very easy onboarding: cargo build (or cargo build --release) will take you 95% there, by fetching dependencies automatically, creating a (precise) lockfile, and building your project with minimal configuration centralized in 1 (one) file, Cargo.toml.
That’s with a workflow where you typically edit Cargo.toml by hand (adding one line per direct dependency), and run tools afterwards.

They’re discussing merging cargo-add into cargo (to not even have to edit manually) but clearly people are managing without that.

So I think cargo’s workflow is friendlier to newcomers and beginner/intermediate level rust users. In particular, it’s centered around lockfiles, per-project dependencies, and tools have good defaults. In opam a lot of this is doable (although the lockfiles are doomed from the start in the presence of a non immutable repository, imho), but the workflow for per-project dependencies is not easy nor intuitive (like, opam sw create . <the compiler version>? I have to look it up almost every time), and you need to fiddle with environment variables. Dune is better behaved but it’s still a different tool to learn.

It’s a bit ironic that I say rust is more friendly when merlin is better than rust-analyzer, and more stable; but the truth is, to get to the point where you write code, with merlin/ocamllsp enabled, and can build and run the code… a lot of beginners probably have quit already.

My dream here would be that dune would absorb the constraint solving capabilities of opam, and that dune-project would become the single file I have to edit to deal with dependencies.

13 Likes

I think there is a misconception that the opam repository is mutable. For the past few years, with increasing rigidity, the opam-repo maintainers reject patches that modify an existing version of a package (and instead bump an epoch, for example foo.1.0 becomes foo.1.0-1).

What we reserve the right to do is to modify the metadata of packages such that they can remain installable in the face of: installation issues (e.g. due to a new version of clang or macOS or whatever) and serious security issues (to make things uninstallable with the serious issue, but to provide a close version without the issue).

This actually makes lock files more robust, since there is enough versioning flexibility to give the solver a bit of wiggle room, but the broad sweep of changes that happen regularly that prevents software from compiling can be fixed. We may need some adjustments to how we generate lockfiles to really make this solid (e.g. use a >=1.0 & < 1.1~~ instead of =1.0) to allow for epochs, but that’s pretty much it.

This can already be the case if you want it as dune can generate opam files. However, I think the root of your frustration is that (due to OCaml’s 24 year old history), we have multiple namespaces: compilation units, ocamlfind packages, and opam packages. Merging those is an effort in progress, but by its nature must be carefully and iteratively done with backwards compatiblity in mind.

Meanwhile, @cdaringe’s approach to systematically list BKM’s and reflect on alternative approaches in a structured way really resonates with me – it’s not enough to say “Rust does this” because…we’re not Rust. We have our own history and our own userbase that can’t just drop all the existing codebases and users we’ve made stability promises to. But putting our learnings from Rust (and Python, and Ruby, and Nix, and other ecosystems) side-by-side and cherry picking the best bits for the future of OCaml – that will work!

8 Likes

Wasn’t this RFCs/rfcs/ocamlib.md at master · ocaml/RFCs · GitHub meant to at least solve some of this issue? Wonder what happened to it? The thesis of the RFC seem quite sound to me.

1 Like

Indeed. That RFC was updated though. You can find an OCaml implementation of the RFC against 4.12 here, see the OCaml status in the RFC for details.

3 Likes

My goal was not to criticize anyone, only the state of things, which is an emergent property. I think a lot of choices made sense in the context where they were made. This is more about where to go next, I think.

What we reserve the right to do is to modify the metadata of packages such that they can remain installable in the face of: installation issues (e.g. due to a new version of clang or macOS or whatever) and serious security issues (to make things uninstallable with the serious issue, but to provide a close version without the issue).

Cargo has “yanked” packages for the security bit: the solver will never select these, the only way to use them is if they’re already in a lockfile. I know the repository isn’t too mutable but I remember some changes to z3 last year that were painful for those whose workflow it broke.

We may need some adjustments to how we generate lockfiles to really make this solid (e.g. use a >=1.0 & < 1.1~~ instead of =1.0 ) to allow for epochs, but that’s pretty much it.

See, that’s not really a lockfile then :slightly_smiling_face: . I understand it’s useful still, but the advantage a cargo (or npm) workflow has here is that the lock is on a version with the hash. It’s the most reproducible you can hope for and means you’re not at risk of solver failure or silent package updates (whatever the reason behind this update is).

This can already be the case if you want it as dune can generate opam files.

Yes! I already use that and it’s neat. The next logical step for a more integrated experience, imho, is that opam would become a library (for constraint solving) and dune would be the sole entry point for declaring dependencies, build targets, and also the one way to build a project — dune build @all could/should install dependencies in the project’s _build.

it’s not enough to say “Rust does this” because…we’re not Rust.

I know! But some changes that have been done already went against what old time OCaml users would do, we’re not just stuck with past. For example dune forces a more rigid project structure onto you — a good thing imho — where previously one could have a library spread over a lot of directories. Esy also showed a nicer workflow (as in, closer to npm/cargo) is possible, although the hack to rewrite paths in binaries seems a bit distastful.

My point is that we can’t just drop everything and use cargo-ml, of course. But tools could go in this direction, and propose new solutions that are more cargo-like (like drom). After all switching to dune is a big breakage for existing projects, but tons of people migrated anyway in their own time, showing that providing new workflows can drive adoption.

4 Likes

Yeah @yawaramin, I really admire what the esy team is attempting to do. I don’t mean to advocate that opam should do X or Y, but I certainly mean to advocate that the default OCaml experience should have clear solves for common development processes. esy has answers, and that’s rad. Whatever our default tools are, they should have unambiguous answers to fundamental, universal development problems as well.

3 Likes

Isn’t this an overly optimistic view of the js ecosystem? Most projects use a combination of npm, nvm and or yarn. They might have a package.json file, might have a yarn.lock file and if you’re lucky a Readme.md that tells you which node version you need and which magic spell is required to set you up.
What’s more, the dependencies in the package.json might be really liberal so it never works on your laptop. yarn install probably doesn’t work and yarn build gives compilation errors as your typescript setup is different from the author’s.
Also, they change their mind about the bkm every few years, so it all depends on how old the library/project is. I’m not saying you should abandon all hope, but js is not the state of the art.
(and typescript has a lying type system :wink: )

@toolslive, definitely. All valid points. I’d still assert that the norms exist and are actively practiced, even if there is fragmentation. There exist defacto processes, even if adoption is not universal.

1 Like

@rgrinberg

Basically, one would have a naming convention for tests

Love it. I internalize your idea as

  • insert <some-standard-dune-test-expression>
  • (optional) tune your test libraries/ppxs as seen fit
  • write tests and never look back!

I think this is because dune has very little to say about traditional tests

Definitely. It’s certainly not dune’s job to provide a formal recipe. Even so, it kind of does provide an opinion towards those other styles. Not complaining, just observing :slight_smile:

I’m not sure what you mean by “mandatory preprocessing”

Ya, thanks for calling that out. I kind of hand-waved that. I often do preprocessing for integration tests. Things like start a dummy/ephemeral database, create a tempdir for isolated execution, set SOME_ENV=test, etc.

I always imagined something like this: (command …)

While writing this segment, I was trying to replicate your exact example. I thought, “i bet i can cobble together an alias + rule to achieve this”, and failed to do so. Glad dune players have been thinking about this too.

@jjb this is a good callout. so, for “clone, install project deps” process, you perhaps are claiming that opam install ./foo.opam --deps-only is the BKM. Perhaps it is. Candidly, this feels obvious in retrospect :laughing: . I have not habitually created a foo.opam as part of bootstrapping a project, perhaps because I’ve been torn/unclear on how or when to produce it. I probably should start doing this first, 100% of the time.

This signals to me that a BKM on “bootstrapping a new ocaml project” ought be captured too.

  • drom has drom new <pkg> (generates .opam)
  • npm has npm init -y (generates package.json & package-lock)
  • cargo has cargo new <pkg> (generates Cargo.toml & src/*)

looking at only platform tools, i think it’s probably? agreeable that there is not a project initializer that preaches a blessed ocaml structure/config.

  • opam init is for getting opam ready, but not for starting an ocaml project
  • dune init is for adding stuff to an existing ocaml project, not creating an ocaml project

I don’t mean to suggest that a CLI cmd is required for bootstrap, but I would suggest that a well-known, minimal set of artifacts perhaps should define a blessed, MVP bootstrapped state.

  • ./foo.opam
  • ./dune
  • ./dune-project
  • ./src/lib/foo/dune
  • ./src/lib/foo/foo.ml
  • ./src/bin/foo/dune
  • ./src/bin/foo/foo.ml

An inconspicuously large amount of OCaml practices can be learned and derived just from seeing those files initialized, especially if produced by the platform. You infer that it is ocaml standard practice to have dune builds for each component. You infer that you partition libs/bins in separate folder hierarchies. You infer that dune can link projects in dissimilar folder hierarchies together. You :crossed_fingers: infer that foo.opam is genuinely critical to a package development workflow, vs something you maybe tack on later.

2 Likes

You can do dune init proj as well as dune init {lib,exe} which gives something like what you want.

Differences:

  • No ml files in lib
  • (slightly oddly) no dune-project
  • Adds a test directory

I use it relatively frequently. Personally I think it would be nice if it additionally generated (by default):

  • A dune-project with (generate_opam_files true) and enough other info to make that work
  • A .ocamlformat file
7 Likes

When init was implemented dune would create dune-project files when it was first run, and we didn’t want the dune-project being generated in two places. But, since then, the dune-project file has become more useful (and so prepopulating it with certain stanzas now makes sense) and recently the functionality to automatically create the file on first run was removed.

I opened an issue to rectify this: Make `dune init proj` create the `dune-project` file · Issue #4367 · ocaml/dune · GitHub

I agree that creating a .ocamlformat would also be nice!

5 Likes

I’m happy to see the trend toward a high bar for in-place metadata modifications. It is important to this bar to be high, such as installation and security issues as you note. In the past it has been problematic when things like a version constraint is added without a version number change to one of a package’s dependencies (e.g. due to a bug in some interaction in some cases being discovered post-release). This has broken builds before, even with lock files. So I think it is good to be clear that it is good to retroactively add constraints to keep packages installable, but it is IMO not good to add constraints that might prevent installation due to bugs in packages, as there can be clients that happen not to hit the bug and it is not necessary to break their build.

2 Likes