Curating contribution ideas and organizing hacking sessions

There was an ongoing discussion on the use of the ocamllabs compiler hacking sessions wiki in two related threads:

Today I went ahead and reorganized the wiki. I described this work and the result on a github issue (#12), please participate there if you have opinions about it. My personal conclusion is that the wiki currently hosts two kind of contents that could be separated, one about “hackaton/contribution coordination” and one about “stuff on compiler-related implementation projects”.

I am creating this topic with the intent of starting a discussion on how we could, as a community, make sure that it is easy for beginners to find things to work on. Can we convince project organizers to collect entry-level tasks and document them, so that Hackaton organizers can point attendees to them? Do we have a curated list of projects that could serve as inspiration for a Summer of Code or similar event?

On the day of the MIT event, @zozozo started thinking about how to present “entry-level tasks” for his mSAT project. @Drup thought of suggesting the (currently down) crowbar blog post for attendees. These are great suggestions, but they should have been made well in advance and recorded somewhere!

If you are a frequent contributor to an open-source / free-software OCaml project, and you do have documentation for entry-level contribution and a future-proof way to search for such tasks on your project, please mention it here! (We should integrate this stuff in the wiki.)

If you are a frequent contributor to a project that does not have such contribution documentation or does not curate a list of entry-level tasks, please start doing this now, and then report here.


A relevant self-quote from the compiler-hacking wiki organization issue (#12):

The things to work on page points to project ideas across the OCaml ecosystem; the previous content of the page (ambitious language changes or compiler hacking ideas) moved to Compiler or Language projects to work on.

This page should remain short if possible. In particular, we should push projects to move the content currently on this page into their space, in a document or a wiki page of their own. Ideally, all OCaml projects interesting in participating to these events would tag/label issues on their issue tracker, and we would just point to that (and their landing page for contributors). Currently this is not the case (Lwt and mSat have one single issue that is being pointed out, does this weird thing of having a meta-issue about contributions), but I think we should pressure projects into adopting this more standardized structure (I’ll start complaining very soon), and push back against attempts to add more project-specific cruft to this page. Otherwise it will be an unmaintainable mess soon.

Lwt has:

  • An easy issues label, already taken advantage of by several new contributors, some of whom have gone on to harder issues, and others to other places in OCaml. We hope to keep adding to this, and one way to contribute is to open issues. Some will be easy :slight_smile:
  • A project list wiki, including cool, open-ended projects such as a uniform I/O API for Unix, Windows, and Node.js, to run on both Reason and OCaml. We should probably expand the projects with details and discussion. Those interested are welcome to come discuss the projects with us, plan them alone or with us, edit the wiki, etc.
  • An introduction to contributing in the README.
  • The beginnings of code contribution overview in CONTRIBUTING.MD.
  • A bunch of ways to contact us.

We’re pretty committed to making Lwt friendly to contributors, including Lwt beginners and OCaml beginners. If you want to test out new ways to help and work with contributors, we’d love to help test them in Lwt.

EDIT: feel like I should link the Lwt community friendliness “manifesto” here :slight_smile:

Great! The wiki currently point to the Lwt refactoring issue (that was a suggestion from IRC, I think), I’ll replace that with your link.

If I may make a comment on the form: it’s ironic of me to say this given how verbose I am myself, but I found this section a bit too long. I had to read through three paragraphs talking about feel-good but generally obvious stuff (sure, I can ask questions) before I got to the “Lwt maintains a list of easy issues” part that I, the hackaton participant, was looking for.

I would suggest having this mention way earlier in the section. In fact, there are many things in this section that you might consider converting into such issues (it’s just an idea, I’m not sure I like it that much): “writing test cases” could be a meta-issue that would contain information on how to do this (currently the link points to the source tree, that’s a bit meh). You could also consider having a meta-issue on “improving the documentation”.

1 Like

If I may make a comment on the form

Thanks, I’ll change it. And I agree. It was one of those things I rewrote so many times, that my brain got tired and couldn’t come up with good ideas on it anymore, so I just posted it for some benefit, to be revisited later. So, the fresh perspective is quite welcome :slight_smile:

(To be clear, I like the feel-good stuff myself, and I agree it is important to say loud and clear that documentation or just mere questions are important and valuable contributions. Do keep something about this, but just, short :- )


I got interested in compiler hacking around two years ago. The first impression I got after reading a few PRs was that it seems people who were able to submit PRs got those domain-specific knowledge from nowhere. And I had this feeling most from PRs on type-checker. While Oleg’s post and the recent presence of HACKING.adoc have been helpful, most critical parts of the type-checker are still obscure. More importantly, it is quite hard to write a good documentation for type-checker. Even after reading some resources on Internet, I still feel pretty lost when exploring type-checker (and -dparsetree, -dtypedtree, -dlambda, and grep are somehow helpful).

And here are some of my more specific comments:

  1. typing/ is the most important module in type-checker, and I remember some maintainers said that it has never been cleaned up during the past 20 years-or-so development. How about we write some documentation for it? I really appreciate the documentation in parsing/parsetree.mli and I think we should also do the same for at least typing/ (but I am not able to or qualified to do that. It would be better to ask, say Jacques Garrigue, to write some comments for it)

  2. GADT. It is hard to find resource to understand the implementation of GADT type-checking. The best one I found is Yann Régis-Gianas’s MPRI 2-4 teaching note. But it is not so easy to digest. Also have realized it might be impossible for anyone to write good documentations for it. My this point also applies to other parts of the type-checker. For example, class and objects.

  3. Recently I was working on extension of module language. I spent some time on typing/ I found Xavier Leroy’s A modular module system is really helpful. That said, we should point out every relevant papers that may help people working on compiler in each HACKING.adoc. For example, put reference to ZINC abstract machine in bytecomp/'s HACKING.adoc. However, one problem could be that people getting scared away because we put references to some POPL papers there.

  4. MPRI 2-4 final projects are really good and helpful, but I couldn’t find an archive of them. Course notes of Advanced Functional Programming L28 are also great.

I think I can say more about my compiler hacking experience as a non-expert. But hopefully my random incoherent rants above can provide some suggestions to make compiler hacking more non-expert friendly. And this Guts of Camel is really beautiful to me. Let us make it more well-known to other people.

I also have no idea what most of the type-checker codebase is doing and I suspect that Jacques Garrigue, Alain Frisch, Jeremy Yallop and Leo White have super-human secret skills, although the miracle/epidemy may be spreading as Frédéric Bour and Gabriel Radanne are showing symptoms already.

I would not encourage people to try hacking on the type checker as their first contribution project. I don’t think that any junior_job tag issues requires to understand its internals, and that is part of my reason for pointing out that the old “Things to work on” page in the compiler-internal wiki contained a lot of actually quite difficult projects. (I guess this was in the context of Jeremy and Leo being physically present to help.)

The disease is propagated through pair programming. The key moment for the transmission is when the carrier of the disease, also called senior programmer, tells the junior programmer "You are using the wrong function here, you should be using Ctype.expand_head_opt, not Ctype.expand_head". The first symptom is usually a shocked expression and the utterance of the word “wat”. It then evolves rapidly into a lot of ranting about the state of the codebase.

Slightly more seriously. @objmagic please add all that to the Hacking.adoc files. I added them precisely for this purpose. Although I try to be inclusive, if someone is afraid by the mention of POPL papers, maybe that person is not completely ready to explore the guts of the OCaml compiler (and XL’s papers are very well written). I also agree with @gasche, the typechecker is really a bad entry point for compiler hacking (although an attractive one).