Monadic type checker and top level declarations

I am currently writing a compiler in Ocaml for a Haskell-ish language using monads (I have my own implementation) for learning purposes. My question is focused around on the semantics behind monads and monad transformers, specifically on the interchangeable usage of reader and state monads.

Currently, the structure of a program in my language resembles the structure of a Haskell program quite closely: a program is an ordered sequence (in this case, a list) of type declarations (type aliases), variable/function declarations/definitions, compiled top to bottom. I initially wrote the type checker plain and simple:

  • a top level function tc_top : (term_env * type_env) -> declaration -> declaration * (term_env * type_env), which is used to perform a fold operation over the program. This function is responsible for evaluating each top declaration, potentially adding variable bindings (in case of function and variable assignments), or type bindings/aliases (in case of type declarations). There are thus two environments: one for variable/function(or term) bindings termEnv, and another one for type bindings typeEnv. This function needs to read from both environments (for function/variable signatures, and to ensure a type alias does not already exist), and to write to both environments. The return of this function is a mapped top-level declaration, since this function may perform some modifications to the structure of the tree (such as add return types to function terms). This function calls the next function, tc_term to evaluate a term to be bound to a variable;
  • a second level function tc_term : term_env -> type_env -> term -> term * ty, which performs a case analysis on the input term. This function also makes use of the two aforementioned environments, but in this case, I cannot write to the type environment, and write operations to the term environment are scoped, similar to how the parameter of a function is bound only on the body of the function. The type environment is used to perform operations such as a custom type equality function. This function returns a pair of a mapped term and its computed type;

This previous iteration worked. However, as an additional challenge, I attempted to integrate monads in the type checking process. For the remaining points, I’ll use Haskell’s monad syntax for simplicity sake.

First approach: State Monads

To start with, I completely removed the environment parameters from the tc_top and tc_term functions, replacing them with a state monad stack StateT termEnv (State typeEnv). This allowed both functions tc_top and tc_term to read and write to both environments which brings slightly more power than what I wanted:

  1. Any function using typeEnv called within tc_term can also write to the environment;
  2. Writing to termEnv from within tc_term is not scoped: for example, when type checking a function term Function (arg, t_arg, body), adding the binding (arg, t_arg) to termEnv permanently changes termEnv to include that binding, even after evaluating body. As a result, the state returned by this case evaluation would contain that binding, which would subsequentially be used by successor assignments type checked by tc_top;

Second approach: Reader Monads

A reader monad would be the most direct translation from the original code for tc_term: simply invert the order of its parameters, making it tc_term : term -> term_env -> type_env -> term * ty; then, applying it to some term would yield a function term_env -> type_env -> term * ty, which can be encoded with a Reader monad stack, ReaderT termEnv (Reader typeEnv). This resolves the two previous issues. When it comes to tc_top, there are two possibilities:

  1. keep it as a it is, changing the call places of tc_term to use run_reader functions, i.e something akin to
let tc_top (term_env, type_env) = function 
  ...
  | Assign (..., term) ->
    let term, ty = run_reader (run_reader (tc_term term) type_env) term_env
  ...
  1. change it to also use the Reader monad stack. However, this introduces the issue that this function can no longer “globally” write to both environments. I can work around this by transforming this function into an expanded fold over a list of declarations, and using local to add bindings when performing the recursive call. This not only seems extremely hacky but also goes against the original semantics of the Reader monad, since I would be modifying an environment that is meant to be globally static;

Third approach: State monad with custom local function

This approach is similar to approach #1, but I add a new local function that simulates Reader’s local function as such:

let local m f =
  let* original_state = get in
    let* _ = update m in
      let* v = f in
        let* _ = update original_state in
          return v

However, once more, this seems somewhat tacky, since we are simulating the behavior of Reader.


As you can see, I have wide fan of possibilities here, each with their own advantages and disadvantages. Out of them, I feel that using a Reader stack and expanding tc_top to a fold over a list of declarations seems the simplest and most direct solution. However, I’m not sure that combining monadic with pure functions is the way to go here, and so I’m more inclined to make tc_top monadic. Thus, I’d like to use the state monad stack StateT termEnv (State typeEnv) in tc_top and the reader monad stack ReaderT termEnv (Reader typeEnv) in tc_term. In this scenario, I’m not entirely sure how to correctly transform each corresponding monad into the other. Maybe a combination of run_reader, lift and gets operations? What would be the most idiomatic way of writing these two functions, preferably keeping the monadic nature of at least tc_term? Is there any other way I’m failing to see?

Didn’t you forget term_env * type_env in the return type of tc_top?

I don’t understand what modifications you would like to do to declarations, maybe it’s better left to another phase.

Ad. #3. It might be useful to compare with imperative OCaml: you would use Hashtbl.add and then Hashtbl.remove. Hashtbl keeps a stack of bindings (it is made for writing compilers after all). Or write a helper for doing that. So it seems #3 is doing a similar thing except purely-functionally, and also the state is more coarse-grained (popping one element in a Hashtbl slot vs replacing the whole state).

I’d like to use the state monad stack StateT termEnv (State typeEnv) in tc_top and the reader monad stack ReaderT termEnv (Reader typeEnv) in tc_term.

This looks like the correct way, just looking at the types. Except I don’t see why you wouldn’t just use one transformer with a pair state, having two transformers means double the painful lifting.

By the way, since you’re using OCaml you can make use of type equalities instead of dealing with Haskell-like wrappers (runReader etc.). A function a -> b -> c -> d is also a function of type ('a, b -> c -> d) reader, where ('a, 'b) reader = 'a -> 'b.

Hello!

Yes I did. Fixed it now, thank you for bringing it to my attention : )

Yes, I am aware about the similarities with Hashtbl. However, since I am writing this compiler out of pure curiosity and imho functional programming is much more fun and interesting than imperative programming, I want to avoid as much as possible such constructs, even if it comes at a performance cost.

I can do that, but because the type checker depends on other systems that only use the typeEnv for reading, and since I have already changed those environments to use Reader monads, it feels natural to use a stack of monads instead of using a single pair. Besides that, I do agree that this approach brings a great deal of painful lifting, especially when mapping from a monad stack onto another. Now that I think about it, I guess I could change the state to a record of all the environments needed, and get only the typeEnv state out of that record when the need to run the remaining monadic systems come. That could actually be a pretty solid solution to be honest, since running other systems depending on a subset of the environments could be as simple as running the system inside a gets/asks function. This seems very well like the solution I was looking for! Thank you!

Yes, I am aware of that. However, I feel like using Haskell-like wrappers in this scenario makes the code slightly more readable than just calling monadic functions like pure functions.