Scaling factors when compiling mutually recursive definitions

I have a 4.59MB module DkUberRestApis/src/DkUberRestApis_Std/Stripe.ml at b7015e0dcbb939ba211baff47befe6a230efb50e · diskuv/DkUberRestApis · GitHub with a long type _ = _ and _ = _ (* ... *) type expression and a long let rec a () = _ and _ () = _ (* ... *) value expression.

It is the generated product of Stripe’s OpenAPI spec (5.93MB) at openapi/openapi/spec3.json at 5ac97c4e18b334feb8f0ed49e002c1343f0ecc4d · stripe/openapi · GitHub.

At the moment compiling Stripe.ml takes forever (more than 20 minutes on a Mac M1 mini with a Mach-O 64-bit executable arm64 ocamlc executable; I killed it because I didn’t want to wait). The type expression by itself takes a few seconds. The value expression takes forever so I’m guessing type inference is the bottleneck.

Questions:

  1. Are there quadratic or exponential time factors in type inference for let rec expressions?
  2. Are there any hard limits in the number of terms in let rec and expressions? (Ditto for type ... and expressions) Currently it has approximately 10,000 terms.

I’d like to understand the constraints first before trying some solutions. (Maybe manually annotate type information, or do a graph cut, or just wait a very long time and distribute a pre-compiled .cma, or whatever suggestion someone has).

Could you check which version of the compiler you’re using ? Using ocamlc version 5.2, this compiles in about 20 seconds on my laptop, with most of the time during type-checking. ocamlopt is much slower, but still finishes in a bit more than 3 minutes (with flambda on, I expect it would be even faster without flambda).

It’s possible that you’re hitting Compiling nest of mutually recursive functions exhibits nonlinear behavior · Issue #12207 · ocaml/ocaml · GitHub, which should have been fixed by Linear-time closure computation by lthls · Pull Request #12222 · ocaml/ocaml · GitHub but is only included in 5.2

1 Like

It is 4.14.2. That patch is good to know but a few minutes on M1 is going to suck tremendously for anyone doing an opam install of the package when I release it. EDIT 1: I get your reported speed with ocamlc on 5.2.0, but I also get that same ~20 seconds with a vanilla ocamlopt (not a few minutes).

I think you have answered one of the implicit fundamental questions though … it is the typechecker that is taking time.

EDIT 1: Scrolling through that patch it seems that the expectation is close to linear speeds. And mutual recursion is capable of handling many tens of thousands of terms. So, while today this is running too slow, I can begin to apply optimizations without worrying about OCaml not fundamentally being able to handle large OpenAPI specs.

Thanks @vlaviron .

Here is summary:

  1. Are there quadratic or exponential time factors in type inference for let rec expressions? ANS: The expectation is close to linear time factor as of OCaml 5.2.0. Earlier versions had quadratic bugs.
  2. Are there any hard limits in the number of terms in let rec and expressions? (Ditto for type ... and expressions) Currently it has approximately 10,000 terms. ANS: The posted issue Compiling nest of mutually recursive functions exhibits nonlinear behavior · Issue #12207 · ocaml/ocaml · GitHub describes a problem and a fix for 40,000 terms.

I’ve made a backport of the linear fix to OCaml 4.14: Backport linear closures bugfix by jonahbeckford · Pull Request #13204 · ocaml/ocaml · GitHub.