Does type inference matter?

Does type inference matter? I mean what’s the profit of having type inference except the ability to write less code by skipping type annotations and what’re the downsides of type inference except, possibly, less performant compiler?

P.S. It would be really awesome if you could provide some real world examples from your practice where type inference had a massive good or bad impact on development and quality.

4 Likes

tl;dr This is a long-winded way of saying “I’m interesting in answer to this question too!”


I am inexpert, but my studies up to this point have lead me to understand that “fully decidable type inference” is one of the core values in ML type systems (see, e.g., http://pauillac.inria.fr/~fpottier/publis/emlti-final.pdf). Arguably, then, a language which dropped full decidability in its type system would be opting to diverge from one core principle of the ML tradition.

However, I think this aspect of the tradition deserves careful questioning. I believe Connor McBride quipped something along the lines of “I don’t want the compiler to infer the types of the programs I write, I want it to infer programs from the types I write”. I am inclined to agree! While type inference can be quite convenient, the recent success of more cumbersome languages which require fully annotated functions and values (i.e. rust) seems to support the notion that type inference is not essential for productivity or user adoption. Similarly, iiuc, Haskell seems to have mostly sidelined inference, making full annotation a norm, and trading in decidability of type inference for a more expressive type system.

imo, this is the main reason to question the necessity of type inference: it seems to place significant constraints on language features. iirc, in recent discussions of the difficulties surrounding the implementation of modular implicits, preservation of type inference was mentioned as one of the principle challenges. Similarly, Rossberg went to pains to construct 1ML in a way that would preserve type inference. I think I would easily trade in type inference for features like truly 1st order modules or full support of higher rank polymorphism. But maybe I don’t know what I’d really be giving up?

Again, I am not an expert, nor do I (yet!) have the privilege of working with OCaml professionally, so it is likely that I suffer from some misconceptions on this topic. I do, however, think this is an interesting question, and I too would be interested in responses (including just links or reading recommendations) weighing in on this matter.

2 Likes

I think types add a lot of noise when you can tell what arguments a function will take simply from its name or its implementation. I like that OCaml uses .mli files rather than pub or private keywords to define the interface of a module because it keeps the implementation files clean for when you need to edit them, and the interface easily readable for when you need to look something up.

Probably the only “pragmatic” (i.e. not related to wanting a more expressive type system) downside to type inference is that it doesn’t always work the way you might want it to when it comes to more complicated types involving polymorphism, polymorphic variants, and so on. There’s a section in the manual about this. If you know about these issues you can easily avoid them by adding a type signature when needed, but if you don’t it might cause a lot of headaches.

5 Likes

I’ll bite.

For me, type inference is the way in which the compiler and the programmer agree what type a certain (function, module, etc.) should be. If the compiler disagrees with you, it means that you have something important in your program that you should look at more closely. I do not want to tell my compiler what types things are; I want the compiler to tell me what it has computed the types to be. In that way, the compiler checks what I have done and I check what it has done. It is just one more check to make sure that you have created the program you intended to.

I have found that when I am programming that where the compiler and I disagree what a type is or should be, the compiler is more often correct than I am. It keeps me honest and informed about the nature of my program.

3 Likes

Oh yes, but only to convince proponents of dynamically typed languages that while you have types you don’t have to write them down; so effectively they can’t complain.

More seriously I wouldn’t mind having to annotate more at the function header level (and as the type system becomes more powerful, we already need to do it more often). However I would find it painful not to have local type inference for the body of functions; having to annotate every let binding would be painful and hinder readability in my opinion.

6 Likes

Type checking is a variant of program analysis that automatically proves that your program behaves correctly. As any other static analysis tools it is worthwhile as it helps you to rule out many programming errors.

Type inference and type checking look very close in that sense, as they fit the same niche. In fact, they are quite different and even live in two different complexity classes, as type checking only verifies that the answer to an equation is correct, while type inference is (a) searching for the solution for the system of equations and (b), in case of the ML type inference, provides the most general answer.

Let’s do some concrete examples, what classical type checking is doing is comparing two expressions for syntactic equality, e.g., a = a , or a -> b = a -> b. From the logic perspective, it checks tautologies. What ML style type inference is doing, is treating the comparison as an equation and is trying to find a solution for it, e.g., a = a, a tautology, is easy solvable. But the ML typechecker can do much more, for example a -> b = x -> y -> z has a solution a = x, b = y -> z. And we have only scratched the surface of the OCaml type inferenence, that is a unification modulo theory engine. The modulo theory part here, is especially interesting, as OCaml is capable of going beyond the syntactic equality, with subtyping theory the syntactic equality is generalized to the equivalence relation over a partial ordering on the lattice-like structure of subtypes. In other words, the type checker may find the solution, that is the least upper bound, for example, the following equation

a -> (<quacks : unit> as b) = x -> (<walks : unit> as b)

has the solution

a = x
b = <quacks : unit; walks : unit>

Let’s do a real-world but simplified example of this. Suppose we have an expression language, and like in C, some expressions can occur on the left-hand side (aka lvalues) and some are not. Like we can’t assign to a+b but we can assign to a or &(a+b). Since we can build arbitrary expressions, we don’t want to manually ascribe each possible tree with whether it is a lvalue or rvalue. However, we can ascribe only our base constructors, and allow the OCaml typechecker to compute for us to which class an arbitrary expression belong. So, let’s define a simplified expression language, that has two constants and one binary operation:

 type _ exp =
  | Int :[> `rhs] exp
  | Var :[> `lhs] exp
  | Cat : ('a exp * 'a exp) -> 'a exp;;

Here, the type of the Cat constructor should not be misread. By using the same type variable for all three parts, we’re not postulating that all three should have the same type, we’re saying that all three should be unfied, and let the typechecker find the least common denominator. For two equal type, this is simple

# Cat (Var,Var);;
- : [> `lhs ] exp = Cat (Var, Var)

but the typechecker can also handle different types,

# Cat (Int,Var);;
- : [> `lhs | `rhs ] exp = Cat (Int, Var)

So, given this, we can define our type safe assignment operator,

# let (:=) : [`lhs] exp -> [> `lhs | `rhs] exp -> string = fun x y -> "ok";;
val ( := ) : [ `lhs ] exp -> [> `lhs | `rhs ] exp -> string = <fun>

that will statically disallow non well-formed programs:

# Cat(Var,Int) := Int;;
Characters 8-11:
  Cat(Var,Int) := Int;;
          ^^^
Error: This expression has type [> `rhs ] exp
       but an expression was expected of type [ `lhs ] exp
       The second variant type does not allow tag(s) `rhs
# Cat(Var,Var) := Cat(Var,Int);;
- : string = "ok"

So again, we only scratched the surface. The general idea is the Curry-Howard Isomorphism between logic and programming, where our types are theorems, our values are proofs, and the typechecker is an automatic theorem prover. So, when you program in OCaml, you write the mli file, that states your theory, you write the implementation, that is a particular model of you theory, and typechecker verifies that your implementation actually models your theory. And OCaml typesystem is expressive enough to postulate quite complex theories.

So why do we need this? Typechecker is an easy to use and fast static analysis tool that finds bugs for us.

16 Likes

This is a very good point, and helps focus the discussion. I would completely agree that some type inference is virtually a practical necessity. I am not quite sure how to articulate the difference we have in mind between this kind of type inference (which, I think, is virtually taken for granted), and the sort of decidable inference that lets us omit annotation of our functions and base values at the expense of severely limited polymorphism…

@ivg, that’s very illuminating, vis-à-vis variant types and subtyping!

However, the (well deserved) praise of type-checking makes me think I should clarify that I have no doubt about the utility and importance of type checking (I suspect the same can be said for @egoholic). In fact, I am skeptical of the importance of inference precisely because want to maximize the power of checking!

iiuc, you are suggesting that one peculiar virtue of OCaml’s type inference is the ability to find a super-type unifying two instances of sub-type. However, it seems that the main advantages you highlight come from the power of sub-typing and type checking, and that the type inference is hardly needed. In fact, in the example you give

you use explicit type annotation make the subtyping come in to play in a useful way. iiuc, the only thing the type inference is saving us here is having to add an additional declaration along the lines of (in made up syntax)

type `rhs, `lhs <: [> `lhs | `rhs]

(Where <: would mean “is a sub type of”). This would seem to point back to @egoholic suspicion that the main utility of type inference just boils down to

In other words (unless I misunderstand), you example seems to shows how explicit type annotations can be used to leverage subtyping to enforce interesting and useful constraints on the type checker. But I still don’t know why we should care about inference beyond saving a few key strokes.


Regarding my own remark a moment ago

I suspected that the latter sort of inference was related to Curry typing and that the former is fully consistent with Church typing, and poking around in Nederpelt and Geuver’s “Type Theory and Formal Proof” seems to support that suspicion:

…in a programming language, one wants to write as few types as possible and let the compiler do type inference: i.e. do the check whether the term we have is typable, and if so, to computer a ‘most general type’ for us. This amounts to the implicit (à la Curry) typing. (81)

I’ll provide an excerpt recounting the difference between Church and Curry typing; for I think this is the right dichotomy for understanding the difference between the kind of indispensable “inference” @dbuenzli mentions and the kind of (arguably) dispensable inference @egoholic and I are questioning.

Typing of a λ-term starts with typing its variables.There are two ways to give types to variables:

(1) Prescribe a (unique) type for each variable upon its introduction. This is called typing à la Church or explicit typing, since the types of variables are explicitly written down … The types of more complex terms now follow in an obvious manner, if one takes the restriction on typability of applications in to account.

(2) Another way is not to give the types of variables, but to leave them open (‘implicit’) to some extent; this is called typing à la Curry or implicit typing. In thiscase, the typable terms are found by a search process, which may contain ‘guesses’ for the types of variables. (36)

With explicit typing, the decidability of typing is almost immediate: the free variables have a type in the context and the bound variables have a type in the lambda abstraction (we write λx : σ . M instead of just λx . M). From that information, one straightforwardly computers the (unique) type of the whole term, if it exists. (65)

I.e., giving up type inference (in the relevant sense under discussion) would not require annotating every occurrence of term; it only requires annotations on the first introduction of basic terms. Thereafter, all more complex types are computed for us.

In either case – whether we’re using implicit typing (with type inference to search for the most general types for our terms) or explicit typing – the type checking will compute the types for compound terms. I think this also indicates a reply to @cyocum’s point: ime, only type checking is required to give the dialogue between programmer and compiler that you describe. If we are constrained to explicit typing, the compiler will still us when we’ve chosen an impossible type for a term (or stuck a term of the wrong type in the wrong place), it just won’t try to guess the types of terms on its own.

(Sorry for the long winded rambling! In my defense, I usually only lurk :slight_smile: ).

Seriously, I chose OCaml because I wanted the uncluttered code and much of the convenience of a dynamically typed language with compile time type checking.

4 Likes

iiuc, you are suggesting that one peculiar virtue of OCaml’s type inference is the ability to find a super-type unifying two instances of sub-type. However, it seems that the main advantages you highlight come from the power of sub-typing and type checking, and that the type inference is hardly needed. In fact, in the example you give you use explicit type annotation make the subtyping come in to play in a useful way.

Let’s be accurate with the terminology. I’m not talking about whether we should write or shouldn’t write type annotations. This is a completely different topic. What I’m saying is more general - the difference between programming languages that perform type inference in the ML style, and languages that do classical type checking in Pascal/C++ style. What I’m saying is that the former includes a theorem prover inside and ML-style type inference is a powerful inference system, with which you can encode complex behaviors of your program, using quite sophisticated type language, and use the inference system to prove, that under all possible evaluations those properties hold. This is far different, from mere checking that type identifiers of values match.

So, the example that I’m showing, is not verifiable by a type checking system. The fact that I’m adding a type annotation doesn’t mean that I’m forfeiting type inference and using type checking. Instead,

val ( := ) : [ `lhs ] exp -> [> `lhs | `rhs ] exp -> string = <fun>

is a theorem, that states that rvalue(x) -> x := y, i.e., that x := y is well-formed iff x is rvalue.

So in order to construct this term from some value x we first need to obtaint a proof that x is rvalue. And this proof is made for us, by the type system, and it will work for values of arbtirary size, e.g.,

Cat(Cat(Var,Cat(Int,Var),Var)

and in case of the real-world examples, where we have dozens of constructors it is much more impressive.

2 Likes

Thanks for clarifying, @ivg. I’m still having trouble tracking down a very clear, concise, and illuminating exposition of type inference which emphasizes its importance for theorem proving. Most things I’ve come across so far discuss type inference in terms similar to the one I’ve indicated above: namely, they mention it as a way of inferring the type of terms from their context, without requiring that the type be explicitly specified in advance. I think this is why I may be hung up on this narrow view (from which vantage it looks like type inference is all about making annotations unnecessary).

However, I believe your elaboration gets to the heart of the question @egoholic posed for this thread, “Does type inference matter?”. (Of course, the convenience of not needing to explicitly annotate does obviously matter to a lot of people, as has been demonstrated by some replies here, and that makes it matter to any community in which those people participate. Yet your point seems to touch on a “deeper” importance.)

iiuc, the aspect of type inference you’re indicating here is related to [this remark] by Hindley (No Subject):

Subject: Typed versus untyped (Albert’s April 10 note).

From: Roger Hindley (majrh%pyramid.swansea.ac.uk@rl.earn
but temporarily c/o Mariangiola Dezani, address as header)

Henk’s slogan “Church vs Curry” sounds a good approximation,
though
“Typed terms vs Type-assignment” gives the flavour of the
distinction better.

One important advantage of type-assignment (TA) systems is that
in TA language, one can ask (and answer) questions of a kind
that cannot be expressed at all in typed-term language.

For example (1) “If we assumed that this part of a term X has a
certain type, what type would the whole of X have?”
Also the question mentioned in Albert’s note: (2) "Is a given
term, for example S(KK)K, an erasure of a typed one?

It is this extra expressive power that makes TA-systems
interesting; they are, roughly speaking, like Meta-Languages
for languages of typed terms. (cf. Milner’s choice of name
“ML”.)

I’ll keep digging to try to understand the significance of this expressive power (if you happen to have any suggestions on papers to read on the topic, I’d be very interested).

Thanks again for your input and for the detailed explanations and clarifications!

I think that you’re confusing type inference or lack of it, with the requirement to write type annotations. Type inference is not the thing that just allows you to omit the types. Of course, you need to write types with type inference, what is the purpose of the theorem proving engine if you don’t write theorems? So, of course, we need to write types, the more the better. Types say what we intend, and terms say what we did. Type inference checks that both match.

Concerning a concrete example, let’s write a module that will prove, that all our secrets are passed via the network only using TLS/SSL channel

type secure  (** a proof that the channel is secure  *)
type unsafe  (** a proof that a channel is open      *)
type normal  (** a proof that data is not classified *)
type secret  (** a proof that data is classified     *)
type 'a data  (** data labeled with the security tag  *)

module type Channel = sig

  (** abstract channel type labeled with the security level.

      Note, a fresh new channel type is generated for each new
      communication, that guarantees the linearity of the type. *)
  type 'a t

  (** [auth channel] raise the security level of the communication
      [channel], returns [None] if the endpoint is not authenticated*)
  val auth : unsafe t -> secure t option


  (** [send channel data] sends unclassified information via an
      insecure channel *)
  val send : unsafe t -> normal data -> unit

  (** [send_secret channel data] sends classified information via a
      secured channel.  *)
  val send_secret : secure t -> secret data -> unit
end


module type Example = sig
  module Channel : sig

    (** [start ~endpoint process] starts communication with the
        specified endpoint.

        The [process] function accepts the connection module that
        implements the [Channel] interface. *)
    val start : endpoint:string -> ((module Channel) -> unit)
  end
end

The above is just a theory, we didn’t have any any terms. The implementation will most likely be relying on phantom types, so the proof that the Channel module implements this interface would be just admitted. So far, no big difference between type checking and type inference. But the fun part will begin when a user will provide the process function. The type system will prove, that whatever a user is doing with the channel in the process function, all the theorems of the Channel will hold, i.e., secret information will be passed only via secure channels and vice versa.

Interesting. It looks like there’s a subset of the ReasonML community (therefore, a subset of the OCaml community) that feels that it’s a generally good practice to annotate functions at their definitions:

And here scroll down to the “type annotations” sections:
http://reasonmlhub.com/exploring-reasonml/ch_functions.html#defining-functions

I understand why one might want to specify types explicitly–for clarity of source code and to check that you and the compiler have compatible understandings–but for this purpose I prefer the practice of putting type signatures in a separate .mli file, so the information is there but I don’t have to look at it when I don’t care about it.

However, people have different sensibilities. I can understand that someone might like seeing type info with the definition. Fortunately, OCaml is flexible enough to support that too.

I like having the mli’s separate, so I can look at a glance what all the types are. That said, I would also not mind a Haskell-like system where top-level type annotations precede bindings. Annotating types in OCaml the obvious way seems much too busy, e.g.:

let f (a : int) (b : int) : int = a + b

versus something like

val f : int -> int -> int
let f a b = a + b
2 Likes

I’ve settled in to a personal style for OCaml code that writes most top level functions like this:

let f : int -> int -> int =
  fun a b -> a + b

or

let tail : 'a list -> 'a list = function
  | []      -> None
  | x :: xs -> Some xs

This causes redundancy when I make functions available for export by adding their type declarations to the module signature, but, personally, I find the value of explicit type annotations available at the point of declaration is worth a bit of copy pasta when I’m ready to finalize a module. (Namely, I like having the types visible to reason about and to document the code).

I am very open to the possibility that you have in mind a different concept of type inference than the one I am familiar, and I readily admit that I am no expert on this topic. However, your claim here seems to directly contradict nearly everything I’ve read on the topic of type inference. Here are four examples:

1.

Type inference refers to the automatic detection of the data type of an expression in a programming language. – (Wikipedia)

2.

automatic type inference: An algorithm that calculates types for a module without requiring manual type annotations [extra emphasis is mine] – (Real World Ocaml, 1st ed, p. 434)

3.

… in a programming language, one wants to write as few types as possible and let the compiler do type inference: i.e. do the check whether the term we have is typable, and if so, to computer a ‘most general type’ for us.

In modern functional languages, various ‘week’ versions of polymorphism are used that allow parametricity while preserving the decidability of type inference. These type inference algorithms build on the work of R. Milner (Milner, 1978), who was the first to develop typing algorithms for polymorphic languages (see also Damas & Milner, 1982). (Type Theory and Formal Proof, p. 81)

4.

The word “infer” occurs exactly once in Robin Milner’s 1978 paper “A Theory of Type Polymorphism in Programming”, and is exclusively associated with the ability to infer types from context, rather then relying on explicit declarations:

many nontrivial programs can avoid mentioning types entirely, since they be inferred from context [sic]. … Although it can be argued convincingly that to demand type specifications for declared variables, including the formal parameters of procedures, leads to more intelligible problems [sic: I suspect this should read “programs”, tho “more intelligible problems” are also desirable, if it is the intelligibility and not the problems which are being increased ;)], it is also convenient—particularly in on-line programming—to be able to leave out these specifications. In any case, the type checker which we present is quite simple and could not be made much simpler even if the types of variables were always specified in declarations.

Your characterization

seems to fit with with what I understand as type checking. In Type Theory and Formal Proof, they characterize type checking as determining whether it is true that

context ⊢ term : types

where ‘context’, ‘term’ and ‘type’ are given. The task is hence merely to check that a certain term as a certain type (relative to a certain context). (46)

If the aim is to get clear on basic terminology, your example here complicates matters rather than simplifies them (for me, at least). The module language introduces an additional level of complexity, and first order modules even more so. If your example meant to indicate that that type inference becomes especially important when dealing with first order modules, I’d be interested in seeing this exemplified in a minimal example.

Again, if you can point me to any articles or resources that discuss type inference in ways that align with your remarks here, I’d be interested in looking at them.

Regardless, I think I understand fairly well where you are coming from now and I appreciate your input: I learned quite a bit reading over your replies. Thanks for spending the time :slight_smile:

3 Likes

Well, there are no contradictions with what I’m saying and what you’re referring. However, there are many ways to reason about the same entity. The point of view that I was presenting was rather unusual, as I was explaining it not from the point of view of a user of the system, but rather from the point of view of the implementor or the designer of the type system. What I’m saying is that the type inference algorithm, unlike type checking, is a powerful device that is capable of solving complex logical equations (modulo theories). Type checking is a far simpler algorithm, that just performs the syntactic derivation (verification). If we will compare it with algebra, then type inference is able to provide a solution for a system of equations:

 x + y = 7
 x + 2y = 11

and the reply of the type inference engine would be x=3, y=4, while the type checker, can only check the answer, e.g., verify that in the context of x=3, y=4, it is true that x + y = 7 and x + 2y = 11.

So, for me, this is the main underlying difference. Thus comparing those two things is the same as comparing a calculator with a computer algebra system. (Both have their uses by the way).

Now concerning the top of the iceberg, or the syntactic difference between the two. In the type inference, you’re not required to write types, i.e., to provide answers. However, you can write type annotations. Care must be taken here, as when we write a type annotation in HM-style type inference, we’re not actually setting the context environment (aka gamma) but rather adding extra equations to the system. So, for example, you can provide, extra equations to the system:

     x + y = 7
     x + 2y = 11
     x = 3
     y = 4

and the inference engine will solve those 4 equations, and will provide the answer.

So, if you know the solution, or the partial solution, it is a good practice to provide it as an extra constraint. This will indicate at least to the reader what you’re expecting to see as a solution. Providing a constraint usually helps in limiting the scope of an error so it is (sometimes) easier to understand the error message. Sometimes it is not. Since the type inference is a solving engine, in case if your system is overconstrained it will provide the answer as a contradiction. It will not treat your constraints differently from the constraints specified by the term that it is typing. So often, the error message would be like this expected that x = 4, but got x = 3, where x = 3 was actually your own type annotation. Not only it is frustrating (like why are you telling me back what I just told you) but it actually hides from you an actual source of the failure, i.e., an equation that doesn’t fit into the constraint. This is one of the reasons, why some people consider that too many type annotations are more like the bane, rather than the boon.

But the most interesting application of a the type inference as an equation solver is when you either don’t know the solution to the equation (because its too complex, or you’re to lazy) or when the system is underconstrained and there is a set of solutions (represented as polymoprhic types in OCaml). Also do not forget, that in real life, the constraint could be much harder, it could involve recursive applications and derivations on complex data types. So guessing the answer could be very hard there.

Concerning my example, I was trying to showcase the power of the OCaml type system in application to the real world, and highlight yet another time the correspondence between mathematical proofs and computer programs. This correspondence is still true for type checking of course, but with the type checker, the responsibility to provide a proof lies on the programmer. With the type inference, proofs are generated automatically by the theorem prover. Not everything could be proved automatically, of course. That’s why OCaml has two separate sublanguages - the module language that doesn’t provide type inference but allows us to express things that couldn’t be proved automatically, and the expression language, which only allows us to construct values that could be proved automatically. (Note, this is a particular design decision of OCaml, not a fundamental separation, e.g., in Haskell and 1ML there is only one language for both decidable and undecidable problems).

Let’s go back to the example, that I’ve provided. It includes both languages. I used a module of type Channel to provide a constraint (the set of equations) under which a user function process should be well-typed. As you may see this constraint is (a) rather complex, (b) is not inferred but superimposed as a constraint. The power of the type inference system is that the provided when a user of the system is trying to type check its function for type (module Channel) -> unit the inference system will automatically prove the all uses of the module Channel were correct, and no secret data ever leaks to the unsafe channel. Perhaps, the problem is that I didn’t finish the example, as I didn’t show an actual implementation of the process function, that will actually show how type inference works.

3 Likes

I find inference most helpful while refactoring – if I change the type of a variable that gets passed through a number of functions, I don’t have to rewrite all the annotations for those arguments – the new type is just inferred, and I only have to change the places where it was used.

5 Likes

[1] Type inference --> you write programs faster (because you most often don’t need to write type information).
[2] Strong static types: those (inferred) types are enforced at compile time --> your programs are more correct than with many programming languages.
There is also no run-time overhead because types are erased at some point during compilation.