Using GADTs to track properties of a state type. Help and alternative approaches?

I have a compiler that features multiple compilation phases. Consider the following simplified example:

module M = Map.Make(String)

type state = {
  phase1: string M.t;
  phase2: int M.t;
  phase3: string M.t;
}

let step1 = M.map String.length
let step2 = M.map Int.to_string

let init source =
  {
    phase1 = source;
    phase2 = M.empty;
    phase3 = M.empty
  }

let run_phase1
    : state -> state
  = fun state ->
    {
      state with
      phase2 = step1 state.phase1
    }

let run_phase2
    : state -> state
  = fun state ->
    {
      state with
      phase3 = step2 state.phase2
    }

Why do you need a state type? Can’t you just compose the functions step1 and
step2?

The answer is yes, if we only want to consider batch compilation. The
state type arose from the development of a language server that needs access to
all compilation phases. I also want to cleverly update the state upon granular
changes (for example, when a source changes only recompile the stuff that
depends on it, although the example is too simplified to capture that)

This gets to the type of thing I would like to capture.

Can I use GADTs to prevent phase2 from being run before phase1 completed?

I am imagining that state could be a GADT such that the types of run_phase*
reflect that a phase transition is happening. I have had success with GADTs in
the past and it feels to me like this is a potential usecase for them, but I
can’t seemt to actually write down what I want.

I realize that I might be trying to be too clever. I could just design an
interface that ensures that everything is being used correctly, but using GADTs
is more slick.

Any pointers, suggestions or alternative approaches are appreciated!

Related:

I’ll suggest some non-gadt alternatives that still let you statically ensure phases are applied in order.

  1. Nest each phase inside the subsequent phase:
type state1 = {
  phase1: string M.t;
}

type phase2 = {
  state1: state1;
  phase2: int M.t;
}

let run_phase1 state1 = { state1; phase2 = step1 state1.phase1 }
  1. Add type params to the state type and use unit for uncomputed phases:
type (p1, p2, p3) state = {
  phase1: p1,
  phase2: p2,
  phase3: p3,
}

type state1 = (string M.t, unit, unit) state
type state2 = (string M.t, int M.t, unit) state
type state3 = (string M.t, int M.t, string M.t) state
1 Like

One thing I had not considered is the following limitation:

I am using Algaeff.State throughout. I am quite happy with how it all works. I can use State.get () and look for the information I want. The thing that I would like to constrain is not the state type, but the composition of functions that act on the state.

One way to make your suggestion work is to create a variant with a constructor for each possible state configuration, but that does not seem like an elegant improvement…

All in all, this topic is not really about solving a real problem. I think it would be cool to find a type-driven solution to the nitpicks I currently have, but the code works as is.

type phase1
type phase2
type phase3
type 'a phase =
  | Phase1 : phase1 phase
  | Phase2 : phase2 phase
  | Phase3 : phase3 phase

let phase_1_to_2 : phase1 phase -> phase2 phase =
  function
  | Phase1 -> Phase2
  | _ -> .

I have not tested this, but Im pretty confident it works. Of course you would have to put the phase inside the state record but thats a small change.