Ocaml-community github organization


#1

In the post named [lib] forking toml? there was a bit of discussion of whether creating a github organization to host forks of abandoned but useful OCaml packages (tentative name, “ocaml-community”) would be of interest.

I’m posting this to draw attention to that discussion (which many people may not have followed since it started as something less interesting) and to solicit feedback on the idea, especially from the opam maintainers.

[[Edited to add: We now have a github organization! see: https://github.com/ocaml-community/manifesto ]]


[lib] forking toml?
#2

In the same vein, we should have an organization for opam package maintainers.
So that people can have a special badge when contributing to the opam package collection (yes, it can motivate some people).


#3

Github doesn’t work that way, and this also doesn’t seem particularly useful to me.


#4

In general, I like this idea.

I’m not sure how beneficial it would be in practice, but in theory having a curated collection of important packages with standardised conventions and shared maintainers might attract contributors and ensure consistency and quality.

Some questions to help the discussion:

  • Who would be part of this community?
  • What libraries should be included? Only the abandoned libraries or other popular community packages?
  • Do we want to outline some requirements for the packages (like documentation, tests, etc)?

#5
  • What libraries should be included? Only the abandoned libraries or other popular community packages?

I think initially, only abandoned packages that some group of participants is willing to maintain. (If no one is willing to do things like processing pull requests and keeping the thing working on new versions of OCaml there’s no point.)

  • Who would be part of this community?

Anyone who seems reasonably trustworthy who volunteers. :slight_smile:

  • Do we want to outline some requirements for the packages (like documentation, tests, etc)?

I think one would have pretty minimal requirements (beyond “abandoned, people want it to keep working, there are some people willing to keep it working”) for initial inclusion, though it would always be nice if tests, documentation, etc., got written. Although I think everyone should put tests and documentation into stuff they release, one can’t be picky when picking up an abandoned package that still has users.


#6

I would formulate this in term of “collectively taking ownership / taking responsibility” rather than “hosting forks of abandoned packages”, but that sounds like a reasonable idea to me. There are a few unmaintained but useful packages out there that people will still want to be able to build and run five years from now (for example, some research code is that way), and a designated place where to maintain them may be nicer than one-off efforts or piling patches in the opam repository.

Some precedents, as already mentioned in the previous thread, are elm-community and coq-community.


#7

Amendment to my formulation gladly accepted.


#8

Here it is: https://github.com/ocaml-community/manifesto

Please contribute pull requests to the manifesto, it probably needs work.


#9

My point was not about inclusion criteria. In case the community adopts an abandoned project in a suboptimal shape, it should be polished to conform to those basic standards.

I think it’s important because people might have a certain expectation that the projects hosted under github.com/ocaml-community will be well maintained, properly documented and tested.


#10

Ideally, yes.

BTW, I’d welcome improved language for the existing README, which is basically just a placeholder. I’m also happy to add you (and other active community members) to the github organization.


#11

I like the idea, but just want to point out that ocaml/ kinda filled that role, notably for packages like ocaml-re.


#12

I find this idea interesting as it is exactly the same situation as what happens in most companies. People change projects, teams or leave and eventually the code is never maintained by the person who originally wrote it. Most projects don’t even have a dedicated maintainer.

The usual way to make this sustainable is to have a big homogeneous monorepo. By seeing 1000 projects as a single big unified project, it becomes humanly possible to keep the whole codebase healthy, even if you wrote none of it.

My advice for the ocaml-community initiave is the following: convert every project thay enters the ocaml-community umbrella to dune and maintain everything in a single duniverse repository. This will make ocaml-community sustainable over the long term.


#13

I don’t think a single monolithic repo is a good idea, mainly because that means a headache when pulling the repo the first time, and it might prevent people from participating (imagine needing to pull a 10G repo when you want to contribute to a small project). Also, it would interleave the history of each projects, which means that it would me much harder to read the commit history for a single project, at least on the github interface (maybe there’s a way using git, but I’m not well versed enough to know it, ^^).


#14

That’s the problem that duniverse solves (/cc @avsm). It uses git subtrees so that you can keep individual repositories but still work on all of them at once.


#15

I’m not sure what duniverse is, but I see other reasons not to have a monolithic repo. Monolithic repos are intimidating: there’s a lot to learn about interleaving parts, just like every company’s codebase. And there’s one big difference between a company and open source: in a company, you’re paid to learn to use the codebase. That incentive doesn’t exist in open source, and it’s really important IMO to keep things as small and simple as possible so that people can feel they can jump in (assuming the project really is relatively small and simple).

At the same time, companies have other incentive structures that don’t match up with open source. It’s really important for the employee to be visible to everyone else – working in a small repo ‘behind the scenes’ doesn’t demonstrate that you’ve been actively contributing to the company, and this is far less of an incentive for open source projects, where it’s more about contributing because you want something particular to improve.

To me, it seems that having the one organization housing different projects is sufficient. It would of course be great to refactor the codebases to use common libraries, and to convert all the projects to use dune, but the cost of having an intimidating massive codebase seems not worth it.


#17

There is a bit of confusion here, so let me explain: the idea would still be to have individual repositories for the various projects, but in addition to have a duniverse repository to acknowledge the various repository into one, to simplify maintenance.

Here is a concrete example of why this is desirable: let’s assume the organization has 100 projects and 5 maintainers. Then a change such as -safe-string comes along that requires adjustment in many projects. Each of the 5 maintainers then has to go through the 20 projects they maintain, check whether the project is affected and fix it if it is. Additionally, they might be interleaving between the projects of the various maintainers, so upgrading all these projects requires coordination. You can easily see how such a process could be time and energy consuming.

On the other hand, if you have a duniverse you can do the following: clone the duniverse, run one build command, fix all the build errors at once and then push back with one command to all the original repositories. The whole process can take less than a day and be done by a single person.


#18

That makes sense. It’s an interesting compromise between monolithic repositories and multi-repo. Does JS use something like that internally?


#19

Nope, we mainly have a single huge monorepo. For publicly releasing our code we do the opposite: we split out parts of this big monorepo into individual projects that we then push to github. The idea of duniverses is more recent.

Another reason companies often like monorepo is that it completely avoids versioning problems: a commit hash gives you a consistent state of the universe and that’s all the versioning you need. This part doesn’t work so well in an open-source context though.


#20

I’m afraid I’m not convinced of the advantage of the monorepo approach for this purpose.

Wouldn’t you just check the CI system? And fixing is no simpler in a monorepo either. I don’t think the repository structure really changes anything here.


#21

You can check the CI system, however the latency is very unpleasant. You basically end up doing the following:

  • edit the code
  • push
  • wait N minutes for the CI to kick in and report
  • scan the log to find the next error
  • go back to first step

This takes a lot of time. With the duniverse you get all the errors for all the projects in real time just as if you were working on a single project.

BTW, I’m talking from experience here :slight_smile: I’ve been working in both open source and industrial settings for a long time and I’ve tried all the various methods (relying for the CI, using many opam pins, …). When you want to do bulk changes, the monorepo approach is by far the most productive.