INTRODUCTION

I want to motivate you to start caring about functional programming! In this article series I will do that by identifying certain areas of traditional object-oriented languages where the code often does not fully satisfy us. Using functional code for the same purposes allows us to better express our intent.

I want to show you how switching to a functional paradigm – thinking in functional style – will make whole classes of problems disappear, especially when it comes to multi-threaded programming.

First off: Immutability

An immutable object is an object whose state cannot be modified after it is created. In multi-threaded applications that is a great thing. It allows for a thread to act on data represented by immutable objects without worrying what other threads are up to. In short: immutable objects are more thread-safe than mutable objects.

Unlike traditional object-oriented languages, functional programming, by its very nature, encourages us to write thread-safe code. In this article I am going to demonstrate this by using something as trivial as variable assignment.

Disclaimer

Although I will be using F# in my examples, the point is not to address specific aspects of F# or .NET. Quite the opposite: The choice of functional language is not important here. It could just as well have been Scala, Ocaml or Haskell. The point is neither to give a comprehensive introduction to language features: There are plenty of books that do that better.

WHAT CAN BE WRONG WITH A SIMPLE LOOP?

Let us start with the basics. Here is a code block familiar to everyone who has experience with any traditional object-oriented language, whether it is C++, C# or Java:

Java
int s = 0;
for (int index = 0; index < data.Length; index++) {
  s += data[index];
}

«Well, there is nothing wrong with it. It compiles, it works, it does the job.» But… Let us have a closer look at some observations:

The code reflects CPU instructions execution

This code looks pretty much like what will be executed by a CPU. «But this is a good thing! It means it performs well.» Sure, but think of how many times we have written such code in our professional life. Think of how similar this code is to what our fathers wrote three decades ago. Is not programming such loops again and again a waste of human effort? Cannot this become a part of the language, so that we can express our intentions in one line? Like this:

Java
data |> List.sum

Low level of details – explicit control of everything

This is related to our previous point, but I would like to formulate it differently: we no longer program single-user, single-threaded tasks. We architect large systems with many layers and challenging functional and non-functional requirements. If we want to relieve our brains from complex architect puzzles, we have to raise levels of abstractions. If we want to compute a sum of elements, it should be sufficient to say what we want to achieve, not how.

Single responsibility …x 1000

We all respect SOLID principles, and the first principle is about single responsibility. Many developers will put code blocks like this one in a separate method to separate concerns. Yes, this is a good thing, but not one without consequences:

The price we, masters of object-oriented programming, pay for writing clean code is that we end up with a bunch of classes, modules, assemblies and packages. Every single unit in these collections is a pure beauty. But a million beauties can form quite the scary beast…

Wait! Things can actually go wrong here

Look one more time at the body of the loop:

Java
s += data[index];

What if «data» is changed while it is being traversed?

Hmm, then this code is no longer trivial is it? Enter concurrency issues. Just few minutes ago we stated that being in full control of every detail brings us performance advantage, but now we encounter a serious potential drawback of low level details independence. It is called mutability.

«You’re all individual! You’re all different!» – proclaimed Monty Python’s Brian to the crowd. A programmer would say «You’re all mutable!»

And while mutability guarantees independence of every tiny data field, it conflicts with consistency as long as multiple changes can come at the same time.

Not so simple anymore

So we need to resolve the concurrency issue, and suddenly our simple loop is not so simple any more. Should we lock it here? Or should we identify a higher level atomic operation and lock it there? How can we relax locking for read-only clients when no one is changing data? Although concurrency problems are as old as a the stone age, they represent a large part of our development efforts, and not the most creative ones and certainly not painless.

WHY DOES NO ONE ASSUME MULTIPLE THREADS?

When we interview candidates at Miles, among other questions we ask them to solve an algorithmic puzzle. It is not anything extremely difficult. Candidates are not expected to write runnable code within 15 minutes. The task is more meant to expose the candidate’s ability to reason under stress and of course to demonstrate a fair knowledge of data structures and their efficient management.

But what we never specify when we are presenting the task is whether the algorithm should be thread-safe or not. And nobody asks if it should. Everyone just assumes that the task should be solved in a single-threaded environment.

But why? When given an algorithmic task, without the explicit instruction to solve it in a thread-safe manner, why do we not make an extra effort to ensure safe concurrent access?

Compare that to other scenarios:

  • When we are asked to write a class or a method, we make sure we release unmanaged resources, such as file handles and database connections
  • When we are asked to implement a database transaction, we make sure it is either committed or rolled back in the end
  • When we are asked to design a data structure, we make sure we choose the most efficient data types to represent data fields instead of storing everything as strings

As you can see, we are quite careful about crucial aspects of software design. We even know that we are at risk of being disqualified if we do not show that we care. However, when we are asked to quickly write a skeleton of a method or a class, we typically write it without any concurrency concerns. And we have good reasons to do so:

It is not expected from us!

OO THREADING IS HARD!

Let us admit it: threading is difficult in object-oriented languages. Everything is mutable by default. There is nothing in C# and Java that makes it difficult to declare mutable data structures and thread-unsafe algorithms. In fact the opposite is true: no matter how much you invest in making your class thread-safe, another developer who is unaware about your intentions can ruin your work just by extending your class with a new mutable member. And he will not need to mark it as mutable. He just needs to be careless.

On the other hand, being careless in the functional language world has different consequences. If you do not care about mutability, you will only get immutable data. You can then send the data between different threads, and it will not get corrupted. The reason is that whenever you attempt to change them, a new instance of the structure being changed will be allocated.

Look at the assignment lines in C# or Java:

Java
a = b;
a = c;

Lightweight and simple, is it not? Just assign a new value, and the old memory buffer will be overwritten (or reallocated). Now look at the code in F#:

F#
let a = b;
let a = c;

You might think I have been wasting your time on moralizing about immutability in functional languages: here we see the same thing, we are just changing the variable value!

LOOKS CAN BE DECEIVING

Well, actually what happens in F# code has nothing to do with it’s C#/Java counterpart. An operator «let» is not a simple assignment – it’s an allocation followed by an assignment. So a statement «let a = b» means «declare a variable a and then assign b to it». But are we not reassigning the value of «a» in the next statement? No – we simply allocate a new variable that happens to have the same name and assign a value to it. So what happens in these two lines can be described in natural language as follows:

  1. Declare a variable «a» and assign «b» to it.
  2. Declare another variable «a» and assign «c» to it.

But what will happen with the first «a»? Well, we will never be able to access it. It will be shadowed. Technically it exists, but it is “lost in space”. There is another «a» that rules now (and that one can even have a different type).

GO WITH THE FLOW

So can you declare and use mutable data in functional languages? Yes, you can. But it is more difficult. You have to write more code, and you have to dislike what you are writing. You have to develop a bad conscience. Look at this:

F#
let mutable a = b;
a <- c;

Feel the difference: you need to remember specifying «mutable» keyword, you need to use a different (longer!) operator to assign a new value. This is how it should be to protect immutability: by not going through these extra steps you end up with immutable data that can fly between various threads in perfect shape!

FUNCTIONAL PROGRAMMING IS ABOUT IMMUTABILITY

Programming languages are very much about encouragement. They are opinionated. They make it hard to do things that are not natural to the language. And they make it simple to perform operations that reflect the nature of the language. And you can see from the examples above that you are guided in the direction the language means is best for you. At least if you want to make your data and algorithms thread-safe.

UP NEXT…

In the next installment of this article we will look at how functional languages support transformations that is a foundation of stateless data management: Why should we care about functional programming? Part 2: Transformations

Publisert 02.05.2013 av

Vagif Abilov