I am currently writing a compiler in Ocaml for a Haskell-ish language using monads (I have my own implementation) for learning purposes. My question is focused around on the semantics behind monads and monad transformers, specifically on the interchangeable usage of reader and state monads.
Currently, the structure of a program in my language resembles the structure of a Haskell program quite closely: a program is an ordered sequence (in this case, a list) of type declarations (type aliases), variable/function declarations/definitions, compiled top to bottom. I initially wrote the type checker plain and simple:
- a top level function
tc_top : (term_env * type_env) -> declaration -> declaration * (term_env * type_env)
, which is used to perform afold
operation over the program. This function is responsible for evaluating each top declaration, potentially adding variable bindings (in case of function and variable assignments), or type bindings/aliases (in case of type declarations). There are thus two environments: one for variable/function(or term) bindingstermEnv
, and another one for type bindingstypeEnv
. This function needs to read from both environments (for function/variable signatures, and to ensure a type alias does not already exist), and to write to both environments. The return of this function is a mapped top-level declaration, since this function may perform some modifications to the structure of the tree (such as add return types to function terms). This function calls the next function,tc_term
to evaluate a term to be bound to a variable; - a second level function
tc_term : term_env -> type_env -> term -> term * ty
, which performs a case analysis on the input term. This function also makes use of the two aforementioned environments, but in this case, I cannot write to the type environment, and write operations to the term environment are scoped, similar to how the parameter of a function is bound only on the body of the function. The type environment is used to perform operations such as a custom type equality function. This function returns a pair of a mapped term and its computed type;
This previous iteration worked. However, as an additional challenge, I attempted to integrate monads in the type checking process. For the remaining points, I’ll use Haskell’s monad syntax for simplicity sake.
First approach: State Monads
To start with, I completely removed the environment parameters from the tc_top
and tc_term
functions, replacing them with a state monad stack StateT termEnv (State typeEnv)
. This allowed both functions tc_top
and tc_term
to read and write to both environments which brings slightly more power than what I wanted:
- Any function using
typeEnv
called withintc_term
can also write to the environment; - Writing to
termEnv
from withintc_term
is not scoped: for example, when type checking a function termFunction (arg, t_arg, body)
, adding the binding(arg, t_arg)
totermEnv
permanently changestermEnv
to include that binding, even after evaluatingbody
. As a result, the state returned by this case evaluation would contain that binding, which would subsequentially be used by successor assignments type checked bytc_top
;
Second approach: Reader Monads
A reader monad would be the most direct translation from the original code for tc_term
: simply invert the order of its parameters, making it tc_term : term -> term_env -> type_env -> term * ty
; then, applying it to some term would yield a function term_env -> type_env -> term * ty
, which can be encoded with a Reader
monad stack, ReaderT termEnv (Reader typeEnv)
. This resolves the two previous issues. When it comes to tc_top
, there are two possibilities:
- keep it as a it is, changing the call places of
tc_term
to userun_reader
functions, i.e something akin to
let tc_top (term_env, type_env) = function
...
| Assign (..., term) ->
let term, ty = run_reader (run_reader (tc_term term) type_env) term_env
...
- change it to also use the
Reader
monad stack. However, this introduces the issue that this function can no longer “globally” write to both environments. I can work around this by transforming this function into an expanded fold over a list of declarations, and usinglocal
to add bindings when performing the recursive call. This not only seems extremely hacky but also goes against the original semantics of theReader
monad, since I would be modifying an environment that is meant to be globally static;
Third approach: State monad with custom local function
This approach is similar to approach #1, but I add a new local function that simulates Reader
’s local
function as such:
let local m f =
let* original_state = get in
let* _ = update m in
let* v = f in
let* _ = update original_state in
return v
However, once more, this seems somewhat tacky, since we are simulating the behavior of Reader
.
As you can see, I have wide fan of possibilities here, each with their own advantages and disadvantages. Out of them, I feel that using a Reader
stack and expanding tc_top
to a fold over a list of declarations seems the simplest and most direct solution. However, I’m not sure that combining monadic with pure functions is the way to go here, and so I’m more inclined to make tc_top
monadic. Thus, I’d like to use the state monad stack StateT termEnv (State typeEnv)
in tc_top
and the reader monad stack ReaderT termEnv (Reader typeEnv)
in tc_term
. In this scenario, I’m not entirely sure how to correctly transform each corresponding monad into the other. Maybe a combination of run_reader
, lift
and gets
operations? What would be the most idiomatic way of writing these two functions, preferably keeping the monadic nature of at least tc_term
? Is there any other way I’m failing to see?