Explain like I'm 5 years old: Tagless-final pattern

Hi! It would be nice to have a friendly introduction to tagless-final somewhere on the Internet. What it is, what it isn’t, when it’s used, other alternatives for similar use-case.

Some questions:

  • Why is it called tagless?
  • Why is it called final?
  • What’s the difference between tagless-final and any tree-like ADT?
  • Can I use it both with and without monads?

Here’s one introduction: Modular Optimizations


I answer the following questions.

  • What it is?
  • Why is it called tagless?
  • What’s the difference between tagless-final and any tree-like ADT?
  • How does it relates to embedded DSL?
  • How does it differ from building up any kind of abstract syntax-tree?

It is about interpretation. Consider the code

g (f 1) (f 2)

What it can mean?

If f is the identity function and g is integer addition, we can rewrite it to

(+) 1 2

If f is int_to_string and g is string concatenation, then we can rewrite it to

(^) (int_to_string 1) (int_to_string 2)

If f is a more complex operation that pairs the number with a statement about whether the number is zero, and g is an intelligent addition function which ignores an argument if it knows it is a zero, then we get an interpreter that is optimized toward suppressing zeros in addition expressions.

Tagless final is a way to give different meaning to the same set of code.

Here is a language. It only has:

  • non-zero integers like 0, 1, 2, 3 …
  • the addition symbol +
  • parentheses ( and ).

There is also a set of syntax rules to distinguish valid and invalid sentences.

  • valid sentences:
    1 + 0
    (1 + 2) + 3
  • invalid sentences :
    +++ 1
    ) + 1 ) (
    ( + 1 1 ( +

The abstract syntax tree is given by the type definition

type expr = Int of int | Add of expr * expr

Any valid sentence like ( 1 + 2 ) + 3 can be parsed into an AST like Add (Add(Int 1, Int 2), Int 3). The AST is an alternative representation of the sentence. The Add and Int are so called tags. If we don’t use tags then we go tagless.

The tagless-final approach and the tagged AST approach are different ways to do the same thing. This “same thing” is to give meaning to the symbols in a sentence, and get a meaning for the entire sentence.

Meaning of 1, 2, 3:

  • numbers 1, 2, 3,… or
  • text string 1,2,3,…

Meaning of +:

  • arithmetic addition, or
  • string concatenation

The tagged AST approach

(* AST definition *)
type expr = Int of int | Add of expr * expr

(* Give meaning to the symbols and sentences *)
let rec eval_as_num : expr -> int  = function 
   | Int a -> a
   | Add x y -> (+) (eval_as_num x) (eval_as_num y)

let rec eval_as_string : expr -> string = function
   | Int a -> string_of_int a
   | Add x y ->  "(" ^ (eval_as_string x) ^ (eval_as_string y) ^ ")"

let my_ast = Add(Add(Int, 1, Int 2), Int 3)

let result_num = eval_as_num my_ast

let result_str = eval_as_string my_ast 

The tagless(-final) approach

(* This corresponds to the AST definition in the tagged AST approach *)
module type SYM = sig
      type 'a repr
      val int: int -> int repr
      val add: int repr -> int repr -> int repr
(* There is no tags, so tagless.  *)

(* This corresponds to the `eval_as_num/string` in the tagged AST approach. *)
module InterpretAsNum = struct
      type 'a repr = 'a
      let int x = x
      let add   = (+)

module InterpretAsString = struct
      type 'a repr = string
      let int = string_of_int
      let add x y = "(" ^ x ^ " + " ^ y ^ ")"

module Eval(I:SYM) = struct
      open I
      let my_ast = add (add (int 1) (int 2)) (int 3) 

let result_num' = Eval(InterpretAsNum).my_ast
let result_str' = Eval(InterpretAsString).my_ast

Our mini-langauge is called DSL because it is specific to the domain of natural number addition. It is called embedded because if we don’t install OCaml then we cannot run our little language — which is embedded in OCaml. Think about, say, Java, you don’t need to install Ocaml in order to run Java programs, therefore Java is not an embedded language with respect to Ocaml.


OK, thanks! But how does that relates to embedded DSL? And how does it differ from building up any kind of abstract syntax-tree? After all, all syntax trees can be evaluated in many different ways.

Thanks again! Is there a benefit for tagless above “tagfull”? Or what are the trade-offs?

First, thank you for this: I always wondered what that stuff was, and now I have some idea (I read the link to that page, which was very informative as it went thru the various stages of interpreters and optimizers).

Second though, from your description, isn’t this just a Church-numeral-style encoding of ADTs ? I’m not saying that that’s bad – just that … not so revolutionary, no?

Third: as I look at the examples (like the one for eliminating zeroes), I can’t help but notice that instead of being able to use pattern-matching, you have to synthesize attributes to carry upwards the information you need. Everything gets less-natural.

Fourth: I -do- get the flexibility – that perhaps some transformation carries you from one ADT to another that shares a lot of constructors, but not all of them. But couldn’t that already be achieved by using polymorphic variants (in OCaml) ?

Just asking. And again, thanks for this introduction!

I’ll try to add my explanation, but as I’m using OCaml as if I where 5, any other explanation is welcome :slight_smile:

As you said, there is no nothing new in the tagless final architecture. Everything that could be written with AST can be written with the same effect with tagless final. Where tagless final is interisting is when you need to extend your code.

Suppose you need to add a Multiply operation: with the classic adt, you have to extend the type type expr and update the existing code to handle the new case (the compiler will raise errors for each unhandled match).

In tagless final approach, you can simply extend the code:

module type SYM_MULT = sig
      include SYM
      val mult: int repr -> int repr -> int repr

module InterpretAsNumWithMult = struct
      include InterpretAsNum 
      let mult = ( * )

You are not limited anymore: you can extend the code by adding new functions on your type (like eval_as_num, to_string in the initial example) but you can also extend the type in a safe way (and without rewriting the whole code database). As the whole code is functorized, you have the control over what is given in each evaluation.

1 Like

As long as the function you’d write on the corresponding ADT is obviously structurally-recursive/fold-like, the TTF approach remains natural. Otherwise, it gets more complex indeed, see for instance negation downwards-pushing there.

But then it wouldn’t be tagless, i.e you’d have dynamic selection based upon the constructor. With TTF, we already know which case (add or int in the above example) will be run (of course, you still need to choose the interpretation).

[slightly edited for a number of typos and clarity]

Obviously we can differ on this, but … I don’t see how there’s a difference here. I’ll give the example of Church numerals: the type is:

churchnat = (A)(A -> A) -> (A -> A)

there are two inhabitants [oops, I’m lapsing into constructivist lingo – two basic typed terms from which all others can be constructed]: fun f x -> x and fun f x -> f x.

we abbreviate these two terms O and S. The ADT would be type nat = O | S of nat.

How do we compute over nat ? I’ll do it two ways, once with the ADT, and then with the Church-numeral formulation.

The ADT way

If it were an ADT, we would compute by match+letrec. But that’s not the primitive induction combinator which is defined from the inductively defined type (e.g. what Coq would derive automatically). That latter is something like [again, making this up from memory]

(B) (P: nat -> B) ((n: nat) (P n) -> (P (S n)) -> (P O) -> (n: nat) (P n)

And when you extract the realizability term in most constructive logics, you get the match+letrec, of the form:

induction_combinator = fun indstep basecase ->
let rec indrec n -> match n with
  O -> basecase
| S n -> indstep (indrec n)
in indrec

[hope I got that right]

The Church-encoding way

But that’s all assuming the datatype is defined as an ADT, yes? Let’s do it again with the Church-numeral formulation:

Recall that O = fun f x -> x and S = fun f x -> f x.

And the type was churchnat = (A) (A -> A) -> (A -> A).

How do we express the equivalent of above? We just take the expressions for the basecase and the indstep, and pass them as arguments to the churchnat, viz.

n indstep basecase

Notice how, if the churchnat is really O, then you get basecase and if it’s (let’s say) S (S (S O)), then you’ll get indstep (indstep (indstep basecase)). Notice how, just as with tagless-final (as it was described) you cannot do direct pattern-matching on the ADT (b/c there isn’t any – you can only access the result of the inductive computation on the subsidiary values in the tree (or well-founded order, generally).

But the two ways are equivalent

If it’s not obvious where I’m going, I’ll add one more little bit: the two ways of doing this are completely interchangeable: there’s a transformation called “defunctionalization” due to Reynolds (in his seminal paper “Definitional Interpreters for …” (not gonna look it up – that should be sufficient to get the search hit)) that explains how you can start with (computations over) the churchnat and get (computations over) the nat (he does it for a CPS semantics-based definitional interpreter of a language, progressively turning into an abstract machine (pretty much the CEK-machine, which is where Felleisen starts).

And of course, the function nat -> churchnat is trivial.

We can pretend that we’re doing “dynamic selection based upon the constructor”, but if that’s hidden away in the automatically-generated primitive-recursion induction combinator, then just what are we eschewing? And when we can see that using the primitive-recursion induction combinator is strictly less-expressive than just using match+letrec directly, again, what’s the gain?

And if we’re not doing “dynamic selection based upon the constructor” but it’s happening at runtime, b/c the closures are implemented as records, with tags and those tags are used to select which code paths to take, then how is that not the same thing?

Some benefits that I see with tagless final:

  • you can avoid constructing and transforming intermediate ASTs (which can take up extra memory, and cause extra work for the GC) and instead keep around just enough information for the transformation/operation you want to do. E.g. you can perform constant folding and only construct a data type when you have an operation where not all arguments are constants.
  • if you’re building an evaluator you don’t need to create and store any intermediate data structures beyond those needed to compute the arguments to the tagless final module function calls, and the stack used for recursion. I haven’t tried but I think this could be used to evaluate (an unbounded) stream of operations that contains side effects with bounded memory usage, as long as nesting of the operations in the stream is bounded. You can’t do that if you first have to build an AST: you might exhaust available memory.
  • if you need it you can still get back an (G)ADT, and you can ensure that this (G)ADT obeys some invariants (maybe a private type so you can only pattern match on it, but only construct it through this interface).
  • you can change the internal data type at any time, it is an implementation detail; i.e. you can optimize it, or improve it without having to change your callers. Your interface is the tagless final module signature.

Some disadvantages:

  • as mentioned before it is less “natural” to work in the tagless final style, constructing an ADT+pattern matching is a lot simpler
  • you end up with lots of nested functors sometimes (e.g. if you want to construct an optimization pipeline as described in the initial post). I prefer |> pipelines of simple function calls instead

Indeed here’s a code base that uses it to full extent in which you can spot that interesting functor application.

While tagless truly composes in every direction in wonderful manners (i.e. somehow solves the famous expression problem), I don’t find it particularly readable and ergonomic.


From my perspective, tagless final resembles object-oriented decomposition, while tagged ASTs correspond to functional decomposition.

In OO decomposition, the operations are specified as interfaces, and each implementation (interpreter in tagless final) provides a way of responding to the operation. Adding a new operation involves editing both the interface specification and the interpreters.

The advantages are:

  1. The internal representation of data can be hidden.
  2. The behaviour can be changed easily by changing interpreters.

The downsides are:

  1. Updating the interface specification may require cascading updates to multiple clients that don’t care about the new operation.
  2. You lose the exhaustive checks of pattern matching and fall back to exceptions such as unimplemented_operation.

In functional decomposition, operations are specified as tagged ASTs. Adding a new operation involves adding a new tag and updating the evaluator.

The advantage are:

  1. Clients of the evaluator can opt into using the new tag if they want, they don’t need to change.
  2. You have the nice exhaustiveness checks of pattern matching.

The downsides are:

  1. The internal representation cannot be hidden.
  2. Changing the representation can break all clients.

This comparison makes sense to me. Haskell’s type classes are for ad-hoc polymorphism as OO decomposition, and they’re quite like tagless final. If you replace their requirement for only one implementation of a type class per type with Ocaml’s or Scala’s modular implicits, you get something close to the examples shared.


Thanks for all answers! One question is left, though: What’s the difference between tagless-not-final an tagless-final? What’s so final about it?

So the opposite to “final” is apparently “initial”. Some info here: Initial and Final Encodings and here: Tagless-Final Style and here: Final tagless encodings have little to do with typeclasses – Foxhound Systems