[BLOG] OCaml linting tools and techniques

Recently, I started wondering about linting tools for OCaml, so I went looking. This ended up being a quite extensive survey. Therefore, I decided to publish my findings in a blog post: OCaml linting tools and techniques.

In particular, I focused on linting with dune and Ppxlib because there’s many variations out there. In the post I describe the technical choices that go into such linters and provide an overview of those that work and how well. In the process of experimenting, I tried them out myself and published them as demos on GitHub: sim642/dune-lint-demo.

Feel free to let me know if I missed any tools out there or you have any questions/comments. There isn’t much information about this out there (and existing tool does it slightly differently), so I hope this overview benefits others as well.


You forgot semgrep …

As I say in the opening paragraph, the unreliability of Semgrep is what made me go on this quest.

Specifically for that use case, its regex matching failed to work for a trivial pattern: `metavariable-pattern` with `pattern-regex` doesn't match `_` · Issue #10193 · semgrep/semgrep · GitHub. I only discovered the working alternative linked in the issue after writing the blog post and opening the issue. A superficial change to the rule syntax changing how regex matching works doesn’t make me confident about it having false negatives.

By coincidence, I realized that I tried the same exact thing with Semgrep 3 years ago and was also faced with it simply ignoring functors altogether: OCaml pattern not found in functor · Issue #3821 · semgrep/semgrep · GitHub. (Or yet another Semgrep issue I have opened: OCaml non-regex pattern not found if unrelated module type include present · Issue #3822 · semgrep/semgrep · GitHub) If I want to forbid a certain code pattern from a codebase, then such false negatives are unacceptable for me. If I cannot be sure that the tool works, then I still have to manually check such things for each PR I review.

All the tools I look at in the blog post are OCaml-specific (rather than having OCaml support as “experimental”) and use the OCaml reference parser (as far as I can tell) and AST. This makes me significantly more confident that they’re correctly parsing and traversing all the code.