Yes, this is a common cognition problem. Functional programming is rooted in mathematics and lambda calculus, which was originally developed as a foundation for mathematics itself. So it is the most fundamental notion, which you have to develop from scratch and then use as you mental basis to reason about the real world. Not vice versa. You can develop this intuition by a constant exercise in mathematics, in other words, it comes with time But let’s start right now.

The good news is that lambda calculus is extremely simple and has only two rules and three syntactic forms. So it is very easy to learn. But before we will learn it let me draw the lines connecting mathematics with functional and imperative programming. Both kinds of programming are pure mathematics, despite that functional programmers usually claim that their style is exclusively mathematics and everything else is a sin. The same is with purity and impurity. Or OOP vs procedural. High-level vs. assembler also doesn’t matter. As long as a program computable - it is mathematics. The main difference is how complex are the rules of the programming languages.

So what is the program, anyway? Why do we write programs at all? The main reason is that programs allow us to express concisely some parts of the real world as well as the processes that flow in the real world. Not everything is expressible, but if we can express some real-world process or an object as a program, then we have a formula. As soon as we get the formula, we can repeat this process, and we’re now in control of this process. It is like learning to build the fire vs waiting for the lighting to strike and give us the fire. This turns us into more powerful creatures. Well, as long as we have computers. The computers are the things that can run those formulas and repetitively obtain the same outcome. It was observed a long time ago, that we can build an automaton, that can read a formula and follow a simple set of mechanizable rules and this automaton could be used instead of a human to materialize our needs. Different automatons are capable of executing rules of different complexity and therefore perform tasks of different complexity. For example, there are state machines, which have a finite set of states and no memory. It is a rather stupid automaton, but still useful. We have a push-down automaton, which has some memory, but at one point in time can remember only one thing. Still quite a stupid creature, but a big step forward. Finally, we have the Turning Machine. This guy can remember as many things as fits into his memory and is quite capable. In fact, all our computers are Turing machines, so you can somehow estimate their capability. There are other fantasy guys such as non-deterministic Turing machines or quantum machines, which we yet to learn how to build. So far, we humans can definitely build Turing machines (and probably starting from 2019 we can really build quantum machines, but let’s put it aside).

Ok, so we can build machines that can follow our instructions and be our mechanical servants. So where is the place for mathematics here? The answer is simple - mathematics is the well-established process of expressing real-world phenomena in terms of a simple set of easily mechanizable rules. Ther same as ancient mathematicians were striving to find a formula for the volume of a pyramid, we’re striving today to find a formula that will send a message across the universe or even let the self-driving car to navigate our cities without killing anyone in the process. The tasks are a little bit more complex nowadays, but the underlying mathematics is roughly the same.

So when Archimedes was coding with a stick on sand it was using simple rules like addition. The multiplication was used to perform several identical additions, e.g., `x+x+x+...+x=nx`

, and the power function served the same role for multiplication, e.g., `x*x*x*...*x=x^n`

. These simple rules together with simple tooling of that time, like multiplication tables or abaci, let them develop quite complex programs, that were employing tools of that time, and that served well for builders, engineers, and traders of the time. Later they developed more complex apparati and evolved mathematics correspondingly. They implemented integrals and derivatives, positive and negative numbers, zero and infinity. They have discovered a lot of mathematical identities and learned how to program better so that more complex ideas can be expressed in a more concise and easier to compute nature. But since the tooling was mostly hand-driven, like still the same abaci and lots of tables, the entrance barrier was pretty high.

Finally, we reach the modern days, the time of Turing and Church, the Rise of Machines. The time when we learned how to build electrical machines, which can perform a couple of simple operations and have some memory. The time when we learned how to build the Turing Machines. This is the time when mathematicians started to refer themselves as programmers and ignore their heritage

Now, we’re finally ready to learn the rules of the programming. Well, we have the rules of integer arithmetic, we use modular arithmetic to mimic real-world integers and we use floating-point approximations to mimic real-world reals. But like all other programmers we can ignore it. At least for now. These rules only allow us to use a computer as a calculator. So we need somehow simple rules for doing repetitive tasks and for storing information in the computer memory, as well as for retrieving. Historically, those rules were encoded using primitive machine instructions, like `jmp`

for fast-forwarding/backwarding the input tape and `load/store/mov`

procedures for storing data in the computer memory. It was very hard to use those primitive instructions directly, so ~~mathem~~programmers started to develop a higher-level mechanism, the same as ancient geeks were developing their own mathematical language with multiplications (repetitive tasks - loops), functions (procedures), and so on. They started to develop more and more complex mathematical abstractions, so that they can build their formulas in a more concise way, like objects, classes, inheritance, polymorphism, agents, you name it.

Their theories grew into a large and unmaintainable mess which was really hard to reason about. It was still pure mathematics and it is, as those theories are definable in terms of simple underlying math of the assembly instructions. The problem is that it was very hard to reason in terms of those abstractions as they have very vague definitions and imprecise semantics mostly implementation-defined by the compilers. These were the dark times of programming when programmers forgot the name of their profession. Wait. It is our times, this is how we program today. Ouch.

At the same time, and even before Turing has built his machine, ~~progra~~mathematicians developed lambda calculus. At that time they didn’t even have an abacus for this calculus, so this was just a mechanism to describe a group of things which they think was better suited than the set theory and in general as the most primitive mechanism capable of bearing modern mathematics, including even such basic notions as natural numbers. It was extremely simple, had only three rules for building formulas

Syntactic Rules:

`x`

is a formula (as well as any other variable);
`fun x -> <body>`

is a formula *binding* `x`

if `<body>`

is a formula;
`<f> <x>`

is a formula if `<f>`

and `<x>`

are formulae.

So we now are able to build programs, e.g., `hello`

is a program, `hello world`

is a program. They don’t have any meaning yet, i.e., if we will give those programs to a machine it won’t know how what to do with it. So we need to give it computation rules, which will tell the computer what to do with these formulas.

Semantic Rules:

`(fun x -> <b>) <e>`

is `<b>`

with `x`

substituted by `<e>`

if `x`

is not bound in `<b>`

;
`(fun x -> <a>)`

is `(fun y -> <b>)`

where `<b>`

is `<a>`

with all bound `x`

substituted with `y`

.

So it is all about introducing variables and making something abstract over a notion that this variable representing, and eliminating a variable that is applying an abstraction to a concrete instance. We should be careful and not confuse variables, so we have an additional rule (the second one) which basically tells, that the name doesn’t matter and we can rename variables using a simple rule.

Surprisingly, this language is powerful enough to express the same computations as a Turing Machine can express or as x86 or ARM machine can compute. Most importantly, it is capable of expressing repetitiveness via the fixed point combinator, i.e., if we pass to an expression `fun x -> <body>`

itself (i.e., apply it to itself `(fun x -> <body>) (fun x -> <body>)`

), then `<body>`

can express itself it terms of itself and express repetitive, recurrent, and otherwise self-referential structures of the real world.

This language is very simple and is very easy to reason about, but at that time it didn’t get enough traction because they didn’t have the abacus for this calculus. However, very soon it was reincarnated in the form of functional programming languages, like Lisp, Haskell, and ML (including OCaml and SML, and other members of the family).

Unlike other languages of the time, which were residing on a very complex mathematics of Algol, these had much cleaner semantics and were very easy to program. The only issue was performance, as computers were built using the Turning Machine idea in mind and there was an impedance mismatch between lambda calculus and primitive operations of the computers of that time. Unfortunately, the laws of the market were stronger than the desire for the beauty and functional programs were late for the train and left mostly in academia, only because they couldn’t compete with imperative programs of that time. The machine time at that time was much more expensive than the programmer time, so it was much more important how fast the program runs, rather than how fast the program programs.

But, with time, machines got faster and cheaper and programmers started to value their time more than the machine time. The compilators also advanced and were even able to compile functional programs into faster code, than their imperative counterparts. So functional programming started to slowly return to the arsenal of a modern mathematician (or should we say, programmer?). The reason for that is purely pragmatic, it is much easier to write functional programs than imperative programs, because you have to reason about only functions and their applications. No objects, no classes, no nothing (okay, okay, we have classes in OCaml, but don’t tell anyone, it was trendy in those times).

Let’s be honest, OCaml is a little bit more complex than lambda calculus. Mostly because it is a strictly typed language. If you want pure lambda calculus with no Mamba Jamba, then try Scheme. In any case, you shall learn Scheme and read SICP to master functional programming. It is an easy and entertaining reading and is a must-read.

But at some point of time, you will find out that types are an extremely powerful mathematical tool that makes you a very efficient programmer, since it lets the type checker to find bugs in your programs and verify them without running them. The rich type system of OCaml lets you minimize the debugging stage of the program development, which usually takes most of the time when you develop programs using less advanced languages, such as C/Rust/C++/Java or much less when you develop programs in dynamically typed languages such as Python/Ruby or Scheme/Lisp. But this is a completely different story.