[ANN] perf demangling of OCaml symbols (& a short introduction to perf)

You could contribute by testing or reviewing the code and responding on the mailing list with a Tested-by or Reviewed-by tag. I’m not sure how much weight is given to first-time reviewers, but it certainly wouldn’t harm. See Submitting patches: the essential guide to getting your code into the kernel — The Linux Kernel documentation

The existing Rust demangler makes similar assumption, so I opted to write the OCaml demangler in the same style. I’m not sure about the strncmp case, but sym[i] == '_' && sym[i + 1] == '_' is certainly common in C codebases I’ve worked on.

This is a good point. I was under the impression that such cases were mostly unambiguous, but it seems to be more common than I thought. I’ll post a v2 of the patch on the mailing list that preserves the trailing identifier id.

I’m not sure about the (2245) pattern, rather than just leaving _2245 in place. It makes it harder to correlate with cmm or disassembly output, e.g. when using grep. It also breaks the property that the demangled symbol is strictly smaller than the mangled symbol (or makes it harder to verify this property).

Replacing __ by . is indeed a bit of a heuristic, but I think it results in more readable names in the end. Even in your example, the include Functor.F (M) line causes bar from functor.ml to be visible in the scope of the module Foo (and is later shadowed). Foo.bar is not a terrible name, especially in the presence (cross-module) inlining. An interesting questions is whether the compiler could generate symbol names in such situations, e.g. by appending [inlined from …], but unrelated to these changes.

To clarify, the patch currently modifies symbols generated by the OCaml compiler as follows:

  1. Remove the caml prefix
  2. Replace __ by .
  3. Unescape $xx sequences, e.g. $2b+
  4. Remove trailing _\d+ (will be removed in the next version)