OCaml RFC#17: library linking proposal

I think this is the wrong movement, parametrizing dependencies by the environment is only ok if you assume all of your deps are in the same folder, for things like esy it generates HUGE environments, which is a problem on Windows and overall a slower operation.

Yeah I’m reading it now, and I disagree with this RFC in many spaces, especially because it is not compatible with cross-compilation, unless the compiler was aware of all the nuances on cross compilation, which is probably not happen and sounds like an even worse concern separation. Or the build system could do even more hacks to workaround this, which to me shows that the OCAMLPATH solution is not the right one.

Could you maybe expand on that ?

I don’t know how you do cross compilation, but personally I do not see any problem with OCAMLPATH and cross compilation. Your toolchain targeting the build os and your toolchain targeting the host os will simply have their OCAMLPATH set to different values to address each library set.

That seems like a hack to me, how can the toolchain knows which library set to use? I can think of two solutions, we could have multiple variables in the env or multiple files pointed out by the env which is what findlib.conf does today.

For multiple variables at same time, it would not be feasible on Windows because of the limit on the environment size. Right now I have at least 5 targets, Android ARM64, Android x86_64, iOS x86_64, iOS ARM64 + host, so that means 5 different “OCAMLPATH” in the same environment.

Also both solutions seems worse than what we have today, by adding the -I as flags, because now you don’t need to only solve flags you need to solve the environment and flags, because different targets still will have different flags.

That may not be a problem for dune, but seems like another hassle to debug, right now if I have a random command failing I can just run dune build -x android.arm64 --verbose get the command that failed, dump it on my terminal and reproduce it, by also depending on the environment to build something I would need to somehow dump the OCAMLPATH generated by dune, load in the environment and then run the command.

Maybe incompatible was a strong claim, it is always possible to make stuff compatible, but it is definitely a regression from my point of view.

edit:

You can also change the id of all libraries to carry the name of the target, but this seems even worse, as now your package is yojson.android.arm64 instead of yojson and the build system still needs to generate different flags or code based on which library to use, anyway it feels like a regression.

Since the compiler does not have a cli for selecting architecture --and it’s not even clear when the compiler gets proper cross support if there will be one, that task is delegated to the build system. That seems apt anyways, once you factor in C library dependencies you will likely have to play with paths anyways.

In any case nothing prevents extending the lookup and install convention to enable selecting archives for a particular architecture directly from the toolchain.

I don’t see any “regression” here.

It will actually be much less a hassle to debug. The proposal is a WYSIWYG system, no hidden state or configuration files. You just need to look at the command line and the value of OCAMLPATH to understand what is happening.

As far as we try to incorporate cross-compilation for MirageOS, it seems fair to use an environment variable to set the path of where cross-compiled libraries can be found.

The current situation for MirageOS relies on findlib.conf and I don’t see any regression between “set the path into one and a global file” or “set an environment variable” - and, as @jeremiedimino said, it seems easy to move from one to the other. As @dbuenzli, we can not say that the compiler currently supports the cross-compilation, so we must shift this responsibility to another tool - and it’s already the case with ocamlfind/dune when both do a lookup on the findlib.conf.

The RFC tries to solve an other issue when we currently reach, by times and with the help of OPAM, a simple layout of the OCaml ecosystem. If I understand correctly, it just tries to incorporate such layout at the compiler level which should help us about maintainability where we no longer need a third-party software to compile an OCaml project with libraries.

It seems a bit orthogonal to talk about cross-compilation and its RFC where the problem is elsewhere - for instance, it’s mostly about the compiler which, again, does not have any option to cross-compile.

Hello, I think the RFC could benefit from an expanded “motivation” section for those unfamiliar with previous discussions in developer meetings. Some of the potential subjectivity around words like “simple”, “uniform” etc. would be much reduced if there were specific examples of the failings of an external tool like ocamlfind, focusing particularly on end-user experience, rather than just from tools/ecosystem maintenance pov.

You just need to look at the command line and the value of OCAMLPATH to understand what is happening.

A few potential drawbacks for your consideration:

  1. Unless the compiler also outputs information like, “picking library ‘x’ from path ‘p’”, users have to reason about that in their head by staring carefully at a potentially complicated OCAMLPATH
  2. The OCAMLPATH value is temporary - I cannot guess after the fact what value was used, e.g. after rebooting my computer and looking at the files on-disk. I can however return to an on-disk findlib.conf to guess what value was used. This may be inconsequential in practice.
  3. This is a departure from the toolchain analogy: pkg-config is-to gcc as ocamlfind is-to ocamlopt. The OCaml manual touts the tooling similarity with C, so maybe giving the compiler additional responsibility (of figuring out a link-line) may make it more difficult to understand or unintuitive for newcomers?

An OCAMLPATH is just a list of directories so that’s as much complexity as you can get. Equipped with a library name you just have to check these directories to find out which one gets picked up.

If you contrast with the current system you potentially have to go and read every META file to understand what is happening because the names are defined in files and have no relationship to the file system.

Well C compilers have -l, -L and LIBRARY_PATH (and Java has CLASSPATH and -classpath and etc.). It’s not as if we are inventing crazy stuff here.

2 Likes

Does OCAMLPATH expose the same problems as GOPATH ?
TL;DR the need to explicitly manage environment variables per project workspace and the lack of ability to have different versions of the same lib serving different projects.

The OCAMLPATH doesn’t bring these problems they already exist, but it doesn’t solve them either. Namely library names are unique so you can’t have two versions of the same library. OCAMLPATH setup will be left to your system’s package manager or to eval $(opam env).

This is true - both are equally complicated - do you know if ocamlfind query <pkgname> makes this any easier? What might be the equivalent with OCAMLPATH? Personally this hasn’t mattered to me so much, and I’m sure you’ve given this more thought than I have - I’m just surveying the functionalities offered by the tool proposed for deprecation.

The C compiler (which just dumbly forwards to your chosen linker), does not take on the additional responsibility of figuring out dependencies, whereas -requires is proposing to do that. I agree it’s not all that crazy, and I am sure some compilers for other languages may do this already. Also, nothing is forcing anyone to rely on -requires instead of explicitly figuring out dependencies using ocamlfind or any other tool, so it might not matter - but to rule out pitfalls, maybe we can ask the question differently:

Why don’t C compiler front-ends figure out library dependencies for you? Have they never thought about this? Is there a downside we may be missing?

As long as ocamlfind is still -allowed- (in the sense that it’s still possible to make it work), this proposal seems fine. There are many uses of “libraries” that don’t need the complexity of ocamlfind, after all.

Concretely, I think that boils down to continuing to support '-pp and -ppx.

That is a problem already solved by ocamlfind for years. And it’s not because the compiler doesn’t have first class support that it justifies making it harder to support it on userspace.

That’s not true, because I been there, C libraries can be packed into OCaml libraries and then easily built, this is how we do on esy, and we have reproducible environments with only the findlib.conf and able to cross compile. And it is capable of building huge applications using a lot of C libraries like Revery to Android and iOS with a couple of patches(some of them already upstreamed). No env magic needed, a lot of magic, but I don’t want a new kind of magic.

And my argument is that yes, that’s possible, but you’re making a new hack, because the core now defines a standard that isn’t capable of describing a big use case. While currently that’s already possible with not a single line of code additional needed.

Also if you mean by looking the architecture to identifying libraries by CPU ISA and OS, that’s not possible, iOS ARM64 and macOS ARM64 generate the same kind of artifact and ABI, same for Android ARM64 and Linux ARM64.

About the RFC

Overall I don’t like the idea of libraries being handled by the compiler, it is a problem already solved by dune on userspace and moving it to the compiler means that the ossification will be way bigger, as moving the core is always slower then moving the userspace.

And especially with cool tech around packages and libraries being developed this is probably something that is gonna bite OCaml in the future. Providing better primitives is always nice, but I don’t see any considerable advantage for all the hassle on this RFC.

Making dune a blessed tool by saying it is “the official ocaml build system” or something like this, would give around the same benefits of the RFC, but no code needed and without breaking anyone workflow.

3 Likes

MirageOS 4 is not assuming a proper package manager with C deps. I was able to cross-compile MirageOS using esy for a couple of months now, that includes MirageOS 3. The problem of cross-compilation is strictly on opam, the only missing primitive in the compiler is cross-compilation on different word-size.

I mentioned at first, windows have a limit in the environment size that you’re going to blown up if you add all your deps in the env, this doesn’t happen with opam, but happens with esy. So yeah it’s a regression.

Which is not reasonable, because you’re still going to mostly need a complete build system and this argument is completely generic, you can use it for everything that is already a standard, if something is the current standard that doesn’t mean that it should be placed in the compiler.

This argument could also be used to say that gcc is not compatible with cross-compilation, and people are using gcc to cross-compile for more time than I’m alive, the compiler is compatible, I have a native ocamlopt.opt with zero lines of code changed that can do x86_64 → ARM64. If you mean the compiler doesn’t provide the tools to build a cross-compiler I agree, but the thing is, the compiler actually is capable of cross-compilation.

And this is why I’m against the RFC.

Actually I’m, especially -L is really important if you’re using dune with implicit_transitive_deps as false + flambda ocaml/dune#4039. I would argue that using LIBRARY_PATH is a bad idea in C too, in the same way that CAML_LD_LIBRARY_PATH is a bad thing and is bad when trying to use bytecode binaries.

The problem is that this is a stance on the compiler saying “you should do this”, and dune is talking about implementing it as the main way while deprecating ocamlfind. For me my concern is that the proposal is pointing the ocaml community in the wrong direction.

If your package manager is not able to devise consistent environments as unix prefixes, then I’d rather suggest something is wrong with it. A lot of things work that way, starting with PATH in your shell so you’d rather cater for it. But in any case you can always extend your environment on the cli and the compiler supports response files.

Well as a matter of fact ocamlfind is doing your env magic. But in any case I don’t see what in the RFC prevents ocamlfind to continue doing it for you. The RFC doesn’t break ocamlfind.

I’m not sure I fully understand what you are saying here.

But I think it should be stressed that the proposal affects more than the compiler. The proposal is about the OCaml system. It also affects the toplevel, Dynlink and regularizes library installs which can be insanely flexible and is problematic for build-system dependency tracking.

It also tries to cut on the moving part and concepts in the system whose cruft is always embarrassing to explain to newcomers.

It should also be stressed again that this is totally compatible with ocamlfind as it exists.

Personally I think it would be an error to “ossify” as you say on dune which I don’t see as the final word on build systems.

Note that the RFC itself is not breaking anyone’s worfklow, it’s precisely devised so that the whole eco-system continues working without needing to change anything. If someone is breaking your workflow here it seems to be dune which apparently no longer wants to support findlib.conf.

I’m more than happy to shift it in the right direction. But so far I haven’t found out from what you are saying:

  1. What is wrong exactly.
  2. What the right direction is.

That’s not the case. Currently, Dune call ocamlfind printconf path if ocamlfind is present in the PATH. That’s what we want to get rid of, since it means that the presence of ocamlfind in the PATH changes Dune’s behaviour.

On the other end, Dune reading a findlib.conf file installed by esy or another package manager is completely fine.

But if reading the findlib.conf file is a standard that is meant to disappear, it doesn’t seem useful for Dune to preserve it.

1 Like

@jeremiedimino, sorry then I don’t know exactly what @EduardoRFS meant.

But in any case what I want to stress is that the RFC itself does not, to the best of my knowledge, break anyone’s worfklow.

It just formalizes the thing ocamlfind does 99% of the time and optionally (and optin) allows you to use this directly from the toolchain — which is nice e.g. for ocaml, Dynlink, reporting bugs, simple projects etc.

esy is consistent and reproducible, not as much as nix, but quite close, but it has sandboxed environments, it generates huge envs, it’s okay to have the ones needed for PATH, most packages don’t define anything on /bin so it can be small, OCAMLPATH on the other hand is huge.

Agreed that dune isn’t the final world on build systems, but to achieve the advantages mentioned in the RFC, it’s a better solution.

The RFC is not about the code, or about what it is doing, the RFC is a statement and anyone making one should be conscient about this, an RFC approved by the OCaml core it’s an statement saying that this is how the OCaml team is going to do it, and by side effect how users should do it. It’s not something that I as someone who is working on tools to other developers can just ignore.

So yes I can still keep using ocamlfind in the same way that you can still keep not using dune for all of your packages(which I deeply like, just regret that they’re not using dune), but that’s not the point. We have a standard already and it’s a working one, which is ocamlfind.

As @jeremiedimino pointed out, the RFC is essentially saying that the findlib.conf is a standard that is meant to disappear and the reason seems to be because of the RFC. The RFC is not breaking code, but it is changing standards, one standard which is already working and defined by ocamlfind.

Ok, I will try be clear, I think the current situation is good enough, the standard is de facto ocamlfind. And moving to the RFC doesn’t add considerably advantages with fast modern IO.

I will use some JS analogies because TC39 is a well established institution who sometimes makes things harder just by defining things, not a single line of code needed but huge movements and changes in workflows happens because of that, they don’t need breaking changes to do a breaking change.

What is wrong

Changing ocamlfind to the standard defined by the RFC, in the same way that changing from commonjs to esmodules was a mistake in the JS ecosystem, the current solution is not perfect but it’s already there, no work needed.

What is the right direction for me

A specification, where it is mostly compatible with ocamlfind, removing the need for the binary but reading the findlib.conf, saying how a build system / package manager should interact. Not a single line of code needed.

Then we can have a list of specifications endorsed by someone who is an authority(maybe the OCaml team?) and a list of tools that implement them which are verified and listed by popularity. That means that right know dune is the standard that “everyone” is using for their build system, same is true for opam and packages.

Maybe I should have said why it’s wrong. Nothing tells me here what is wrong from a technical perspective.

The current status quo works and it will continue to work with the RFC. It is however very complex for what it’s doing. Let’s not forget dune actually reimplemented ocamlfind whose behaviour is not specified in anyways except by its implementation. I won’t even talk about the build system trickery needed to make Dynlink work with libraries.

The RFC is meant to be the simplest step in order to gradually start to simplify the system (something I envision taking at least 10 years) both for eco-system tool devs and end-user (cutting down on the metadata, name, and terminology orgy).

It’s a very small step that could also unlock further things along the way (e.g. namespaces, if they ever happen) and I don’t think it jeopardizes the future of ocaml cross-compilation in anyway (you may however end up doing it differently in 10 years than you do it now).

1 Like

First of all, I’m sorry because I’m going to talk about a subject that is, as I said, orthogonal to the first subject. Secondly, I’m not going to take point by point the remarks since it is essentially a big picture before being a technical answer. However, it is not fair to say that the cross-compilation issue (or more generally how we build a MirageOS for another target) is essentialized only on what you have done. Nor do I want to say that we have not taken your work into account.

We have been trying to experiment with cross-compilation solutions for two years (even longer if you take into account @TheLortex’s work on ESP-32). For each experimentation, we need to reword our “inbound” especially with Solo5 and our outbound with our different unikernels (like mirage-www, unipi, dns-primary-git or pasteur). Between these limits, there is a huge gap which unfortunately is not limited to our compiler. In this gap we have OPAM, ~ 150 libraries, some libraries where we don’t have the ownership and many applications that are not unikernels (but use our libraries).

The link between each is very tedious, It clearly doesn’t appear in annoucements or releases or even a roadmap (which is not good, I agree). But it requires first and foremost a social work in relation to an ecosystem that we have been participating in for … a long time. So, it’s a real work of ants that allowed you to cross-compile a unikernel with OCaml and esy. In my mind, 2 works crystalizes all our problems between these two bounds:

  • the appearance of mirage-crypto (in place of nocrypto)
  • the disappearance of mirage-os-shims

What I mean in the end is that it is “easy” to cross-compile and, as I said, we have already done it for a long time. However, we have obligations that go far beyond just being able to do a little unikernel on a Raspberry Pi - and we owe it to ourselves to take them into consideration.

All of this requires synchronization with many people because “our world is not their world”. That’s why we are always grateful for the work of other people like yours (about mirage-crypto) - but also of many others who are too discreet for my taste. So we have to be the bridge between what you want, what other people want and our goals. Going back to the RFC, as I said, the problem still seems orthogonal and doesn’t add inconsistency to our goals.

To end on a small note about OPAM, MirageOS is not independent of OPAM in the first place - more accurately, MirageOS and OPAM were born “at the same time” in a coherent definition. Although there seems to be compatibility between MirageOS and esy, it exists mainly because esy understands the OPAM file. But there is a link between OPAM and MirageOS that is likely to be strengthened in relation to our other goals. MirageOS 4 will continue to maintain this link with OPAM and more specifically with opam-monorepo.

1 Like

The part that Dune reimplements is actually very well documented, just like all of Gerd’s work really: META

I didn’t have to look once at the implementation of ocamlfind to reproduce this part.