Central OPAM documentation site

This sounds good, thanks for putting the time in! Please do update here with a prototype when you get something running, even on a small scale set of packages. We could, for example, use the small package list from github.com/mirage/docs to try this out before throwing a lot of compute resources at a more comprehensive index.

I’m wondering, why do we need this versions? Tools like hardlink will automatically unify files, so that when we will generate N universes for N packages, all duplicating files will be still hardlinked, i.e., the same version of core, if used by m packages will still occur only once. Moreover, git itself will enable this kind of unification on a lower level granularity and will deduplicate information which occurs inside of the files which are overall different. Therefore, trying to implement this kind of unification on a semantic level would be a duplication of work which is already done.

The deduplication is already done by esy, and it’s done at the most efficient level ie. compilation doesn’t happen unless it’s needed.

What I want to do is just take esy's cache, parse it and map it to produce that same deduping at the doc level. This also gives us the benefit of being able to index multiple versions of packages needed and separate them out logically, so if B depends on A v1 and C depends on A v2, odig could present both versions of A in its global index, clearly laid out.

1 Like

That’s neat, great idea :slight_smile: a little bit harder to implement (requires some work, wrt to hardlink’s poor man solution), but sounds more interesting. It is also nice to have a global documentation for both universes, Reason and OCaml. Besides, do you aware of any endeavours to build all opam projects in esy?

opam projects that doesn’t randomly access filesystem should be able to be built with esy. There is a manually maintained repository for packages that doesn’t build with esy out-of-the-box.

3 Likes

Just chiming in, but I think there may be simpler possibilities directly for opam:

  • we have some ways already to install all packages (or just their latest versions) in as few iterations as possible (Marracheck, or the older greedy prototype here)

  • the cleanest way to extract cmt(i)s would probably by through a post-install hook: you can specify any command to be run after every package installation, and you have access to the package name and version, the list of files it installed and its build dir (see here for an example).

  • of course, you also have the simpler option to specify --keep-build-dir, run all the installs, then scan everything in SWITCH/.opam-switch/build/PKG.VERSION for artifacts

  • then I believe we could run odig/odoc on this pile of artifacts to generate as-complete-as-possible documentation ?

3 Likes

@AltGr I don’t think you need to play with --keep-build-dir or post-install hooks. Just let packages install most of them do nowadays intstall their cmti files. Package installs is what odig naturally consumes, see my message here. Point 3. is Marracheck.

I think it would be good to have that version first before we try to change all of the tools to support versioning and try to make them consume unconventional install structures.

Assuming Marracheck works this should simply be a matter of running programs at that point.

1 Like

Indeed; what the post-install hooks would give you is attribution of the files to the opam packages and versions, and aggregation of the libdirs across different opam install commands. But odig can already process everything without the need for that :slight_smile:

1 Like

If you run esy env you’ll get your package’s environment, and OCAMLPATH can be used to get the transitive dependency paths from the global build cache in a way that represents your current project root, with sharing among all other projects you’ve built on the system. That might be enough to prototype something to see if it’s even the best direction to begin with.

I’m not really sure what it would mean to build a centralization of all docs outside of a particular project root though. Curious to hear your thoughts on that.

@dbuenzli, @gasche: where is the code to get your best-effort listing, as seen above? I agree that given the fact that it’s already running, we should be able to get something up in minimal time using that approach.

My thought was to use esy’s dedup by a. listing all OPAM packages b. creating a dummy project per package, listing only that one package as a dependency. esy would then create it’s DAG of dependencies under .esy, with the links in json files and the packages in their appropriate files. This information can then be hoovered up by odig directly, so long as it knows to support versions and to read esy’s files.

The one downside of this is that it uses esy’s metadata directly. It would be nicer if there was a way to query esy for all of this information: give me the location and version of a package followed by its dependencies’ location and versions. We then wouldn’t be dependent on things that could change later.

Here, basically this is setup.sh and gen.sh.

You can ignore publish.ml it’s simply doing symlink tricks to be able to publish the different themes listed here over a single data set on gh (see this branch if you are interested). For a simple single theme publication on gh you should be able to get along with the gh-pages-amend tool distributed with odig, see here for a bit of doc.

Now the task would be to do something smarter using marracheck in setup.sh along the lines mentioned in my previous message.

I think this would be a good way to make sure you can build consistent universes of documents for each opam package root, and do it efficiently. Note that disk size for the entire opam universe may be quite large. I’m not entirely sure how good the cache hit would be. Esy implements reliable caching by ensuring transitive dependencies a library is built against, participate in the cache key. So even though two root packages share a dependency, it’s no guarantee you’ll be able to cache the build of that dependency. Still - I find it does in practice greatly reduce new root package build times - usually there will be a few different builds of one package that cause cache hits for a much larger set of packages that depend on it.

1 Like