two thoughts: (1) yes, if your program actually needs more cores for Ocaml code, it’ll run slower with the current runtime design.
(2) is your problem amenable to an explicit parallelism solution? E.g., MPI, or perhaps @XVilka (I think it was) has libraries for explicit parallelism (and manages all the copying, process-creation, etc).