Are there any tools like go vet for finding suspicious code in an OCaml codebase, including third-party dependencies? For example to identify calls to unsafe Stdlib functions, uses of deprecated APIs, calls to Unix functions with different behaviour on Windows, C stubs, common API misuses, etc. I don’t mind if I need to write my own checks.
I’ve found some lists of existing projects, but so far all I’ve seen are long abandoned, deprecated or only deal with coding style.
That is called a linter, and no there is no good linter for ocaml.
You have compiler warnings but thats it. It is slightly less needed than in other langages because of the type safety and general good design (for instance in python a common use case for a linter is an empty list as a default argument value, you do not have issues that big in ocaml)
I didn’t say linter outright because different language ecosystems have very different expectations from that word, often leaning towards real-time code style warnings rather than project-wide code analysis. But ideally I’m looking for a tool that provides a report of all hazardous code locations for review, even from my dependencies.
And yeah, OCaml the language does have less pitfalls than other languages, but I don’t think the same is true of OCaml the ecosystem. I don’t think I’ve got the resources to write my own, but I’ll mull it over if I don’t find an alternative.
Thanks all for making me aware of semgrep. OCaml support is indeed broken, going by the repo issues (e.g. open modules aren’t considered for path resolution). Still, I generally prefer investing in multi-language tools[1], so I’ll check it out.
Other ecosystems have successfully built tools for scanning full projects or packages using semgrep’s engine, so maybe improving language support and some plumbing would be enough to integrate it with opam and dune.
(Rant) I don’t think tree-sitter parsers are the right foundation to support multiple languages in these tools though, just the least bad available right now. We get the downsides of both table-driven parsers (worse, with unsound parse trees and “lexer hacks” handwritten in C) and shipping precompiled libraries, while also leaving name/type resolution as a language-specific exercise to the consumer.
But I think users just want a uniform query API like tree-sitter’s, so we should agree on those and ship actual language services as loadable libraries instead. ↩︎
Feel free to open issues on the semgrep github repo! The more we get complaints about OCaml support, the more we will spend time adressing them! Also semgrep is written in OCaml and is OSS so feel free to contribute
Thanks, I had seen Zanuda and it’s probably the closest to my requirements that already works, but I couldn’t find any documentation on adding custom rules or whether it can be done at all without forking it. It also seems to only work on dune projects, which is fine to lint my own code but no so much for reviewing my dependencies.
The mentions of it being an experiment and inherently tied to 4.14’s data structures also concerned me a little, but the feeling went away when I realised the overall state of these tools for OCaml (and I’m still using 4.14 anyway).
in the context of ahrefs we were looking for linters. Zanuda was mentionned but we ended up using something else because we need to be able to define rules that are specific to us. We have our style, our naming conventions, things related to the libs we are using, … So depending on how you want zanuda to be used, that could be a useful addition. There are some checks of zanuda that could be useful already but we try to stick to one linter at a time. Btw I’m not saying that we would switch immediately (or at all) if the feature appeared, so please do not feel forced to work on that.