~ Save the tuple calls! ~

Erlang Tuple Calls – OOP principles in a functional language

Encapsulation

Refers to the bundling of data with the methods that operate on that data.
https://en.wikipedia.org/wiki/Encapsulation_(computer_programming)

Polymorphism

The provision of a single interface to entities of different types.
https://en.wikipedia.org/wiki/Polymorphism_(computer_science)

Introduction

Nowadays, more and more non-functional programming language implements functional elements because they realize that there are great benefits to these principles in today’s computing environment. Seeing this as someone who mainly uses a functional language, the question is inevitable.
If they can adapt some of our tools, can we do the same to make our life in functional paradise easier?

In this article I would like to demonstrate how to apply the encapsulation and polymorphism principle, which are core principles of object-oriented programming, in the functional language Erlang. I’ll try to build our example in as small steps as possible and include any relevant information no matter how basic they are, so even people without Erlang knowledge can make sense of it. In spite of this I can’t explain everything (this is not a “how to program in erlang” article) and there will be places where you will need to apply some common sense (or previous Erlang experience).
But if you think something is missing, and it’s absolutely necessary to understand certain parts feel free to contact me.

The main element of these principles are the data, so let’s start with the introduction to some of the data types available in Erlang.

Erlang Data Types Used

The Atom

An atom is a literal, a constant with name.
http://erlang.org/doc/reference_manual/data_types.html#atom

Examples: name, age, location

The Tuple

A tuple is a compound data type with a fixed number of terms:
{Term1,…,TermN}
Each term Term in the tuple is called an element. The number of elements is said to be the size of the tuple.
http://erlang.org/doc/reference_manual/data_types.html#tuple

You can access elements of a tuple by their index using the element(N, T) function where N is the index and T is the tuple.

1> T = {alice, 33}.
{alice,33}

2> element(1, T).
alice

3> element(2, T).
33

The Record

A record is a data structure for storing a fixed number of elements.
It has named fields and is similar to a struct in C.
However, a record is not a true data type. Instead, record expressions are translated to tuple expressions during compilation.
http://erlang.org/doc/reference_manual/data_types.html#record

So a record is basically a tuple where you can access the elements by using a name instead of an index. It takes away the pain to always have to keep track which index represents what kind of element. Also it makes possible to modify the structure of a tuple without affecting functions already operating on the tuple.

We need a module for this example because records cannot be defined in the shell.

person.erl:

1
2
3
4
5
6
7
-module(person).
-export([new/2]).
     
-record(person, {name, age}).
     
new(Name, Age) ->
    #person{name=Name, age=Age}.

erlang shell:

1> R = person:new(alice, 33).
{person,alice,33}

2> element(2, R).
alice

As you can see that without the record definition the shell sees the record as a tuple. You can observe that at the 1st position we have the record’s name followed by the field values in the order of the definition. As a record is really just a special tuple you can use any function that works with tuples on a record too.

We can use the rr shell command (stands for read record ?) to load record definitions from a file.

3> rr("person.erl").
[person]

Now that the shell loaded the record definition for the person record it recognizes R as such.

4> R.
#person{name = alice,age = 33}

You can use the record definition to access R‘s fields by name.

5> R#person.name.
alice

6> R#person.age.
33

Omitting the variable name and just using the field accessor will give you back the index of the field in the tuple thus using element(#person.age, R) is equivalent to R#person.age

7> #person.age.
3

8> element(#person.age, R).
33

Encapsulation

The rr command is a special shell command which is only available in the shell, you can’t use it in your modules. Record definitions are private to the module defining it (you won’t have it in any other module), which comes very handy for our encapsulation principle. We define the record in one module and provide public functions for other modules to operate on the record fields. These kind of functions are called accessor and mutator or getter and setter.

Let’s provide some accessor and mutator functions for our previous person record!

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-module(person).
-export([new/2,
         name/1, age/1,
         name/2, age/2
        ]).

-record(person, {name, age}).

%% construct
new(Name, Age) ->
    #person{name = Name, age = Age}.

%% name accessor (getter)
name(#person{name = Name}) ->
    Name.

%% age accessor (getter)
age(#person{age = Age}) ->
    Age.

%% name mutator (setter)
name(Name, P) when is_record(P, person) ->
    P#person{name = Name}.

%% age mutator (setter)
age(Age, P) when is_integer(Age), is_record(P, person) ->
    P#person{age = Age}.

Take a closer look at our last function, the age mutator. The is_integer(Age) expression is called a guard, and it makes sure that our function can only run when the type of the Age parameter is integer.

Let’s try out our new functions:

2> P = person:new(alice, 33).
#person{name = alice,age = 33}

3> person:name(P).
alice

4> person:age(P).
33

5> person:name(bob, P).
#person{name = bob,age = 33}

6> person:age(34, P).
#person{name = alice,age = 34}

7> person:age(twelve, P).
* exception error: no function clause matching person:age(twelve,#person{name = alice,age = 33}) (person.erl, line 21)

By using the is_integer guard we guaranteed that the age field can only be updated with an integer value, and when trying to use any other type we will get an error. Applying the encapsulation principle this way we can enforce rules on how our data behaves, and the rest of the world are safe to assume that those rules always apply. If someone try to brake the rules (e.g. setting age to something else than integer) we will instantly get an error at where those rules were broken. If we would have allowed direct access to our age field we would only get an error that it’s not an integer the next time we try to use the value of the age field and it can take a long time to pinpoint the location where the bogus value was set.

The Great Divide

Most often you run into the problem of having multiple type of records that need to operate on a common set of functions. In object-oriented languages it is called an interface.

In object-oriented languages, the term interface is often used to define an abstract type that contains no data or code but defines behaviours as method signatures.
A class having code and data for all the methods corresponding to that interface is said to implement that interface.
Furthermore, a class can implement multiple interfaces, and hence can be of different types at the same time.
https://en.wikipedia.org/wiki/Interface_(computing)#Software_interfaces_in_object-oriented_languages

That roughly translates to us that a set of functions with the same signatures operates on different types of records.
To demonstrate this, lets break down our person record to a male and female one with similar fields. Furthermore as it is impolite to ask a woman’s age but you still have to buy birthday gifts we will replace the age field of the female record with a birthday one. We will only store the year of the birth to keep things simple, and it also fits better with our interface.

-record(male,   {name, age}).
-record(female, {name, birth_year}).

What our interface should look like?
Exactly, the new function and our accessors and mutators.
Let’s start with the new function. We can create a new_male/2 and a new_female/2 functions separately but that doesn’t fit really well with the very definition of an interface because one function can only work with one record. Instead let’s extend new/2 with a gender parameter which will tells us which record we need to create.

new(Name, male, Age) ->
    #male{name = Name, age = Age};
new(Name, female, Age) ->
    #female{name = Name, birth_year = age_to_birth_year(Age)}.

As you can see the 2nd parameter will decide which function clause will match, thus what kind of record is created. We compute the birth year from the Age parameter to keep the same signature. The age_to_birth_year/1 is just a simple function which subtracts the Age from the current year, just to keep it simple. (the function is not included here, you can find it in the actual source code)
After this the rest is straight forward. Lets modify the remaining functions and put it all together:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
-module(person).
-export([new/3,
         name/1, age/1,
         name/2, age/2
        ]).

-record(male, {name, age}).
-record(female, {name, birth_year}).

%% construct of male
new(Name, male, Age) ->
    #male{name = Name, age = Age};
%% construct of female
new(Name, female, Age) ->
    #female{name = Name, birth_year = age_to_birth_year(Age)}.

%% male name accessor (getter)
name(#male{name = Name}) ->
    Name;
%% female name accessor (getter)
name(#female{name = Name}) ->
    Name.

%% male age accessor (getter)
age(#male{age = Age}) ->
    Age;
%% female age accessor (getter)
age(#female{birth_year = BirthYear}) ->
    birth_year_to_age(BirthYear).

%% male name mutator (setter)
name(Name, M) when is_record(M, male) ->
    M#male{name = Name};
%% female name mutator (setter)
name(Name, F) when is_record(F, female) ->
    F#female{name = Name}.

%% male age mutator (setter)
age(Age, M) when is_integer(Age), is_record(M, male) ->
    M#male{age = Age};
%% female age mutator (setter)
age(Age, F) when is_integer(Age), is_record(F, female) ->
    F#female{birth_year = age_to_birth_year(Age)}.

As you see every function has two clauses now, one that works with the male record and one that works with the female record. Similar to the new/3 function the female part of the age/3 (age mutator) function needs to actually compute the birth year from the age (age_to_birth_year/1) meanwhile the age/2 (age accessor) needs to compute the age from the birth year (birth_year_to_age/1).

Lets test our new module:

2> A = person:new(alice, female, 20).
#female{name = alice,birth_year = 1998}

3> person:name(A).
alice

4> person:age(A).
20

5> person:name(barbara, A).
#female{name = barbara,birth_year = 1998}

6> person:age(21, A).
#female{name = alice,birth_year = 1997}
 
7> B = person:new(bob, male, 20).
#male{name = bob,age = 20}

8> person:name(B).
bob

9> person:age(B).
20

10> person:name(charlie, B).
#male{name = charlie,age = 20}

11> person:age(21, B).
#male{name = bob,age = 21}

To summarize: We created two records (male, female) and implemented an interface with functions new/3 which creates a record, name/2 and age/2 accessors which return name and age respectively, name/3 and age/3 mutators which change name and age.

What’s the problem with this design?

Everything is in one module.
What if you want to implement the same interface for another record?
For example you want to further divide our records into adult and child. Then you’ll have 4 records instead of 2 (male_child, male_adult, female_child, female_adult).

You will need to have the definitions of the new records in this module too (person in this example) because remember, record definitions are private, and you will need those record definitions to extend your current interface functions with new clauses that’ll work on the new records.
In addition usually there are other functions that are specific to one record only. Like if you want to introduce a manliness function that only works on the male record but not on the female. You have to put that function in the same module as your record definition too.
Code organization is a keystone of easy code maintenance but the private nature of the record definitions greatly restricts your options.
You MUST make the record definitions available to both the interface and non-interface functions somehow meanwhile organizing the code in such a way that’s easy to read and maintain in the long term.
    
So what’s the solution?
Obviously not having multiple record definitions for the same record in different modules, because that’s just asking for trouble. So putting interface functions to one module and non-interface ones to another while having the same record defined in both modules is not an option.
Another solution is to put every interface and non-interface function in the same module. That means you’ll have multiple records and all of their functions sharing the same module. What you’ll end up with is called a god object or in our case a god module. Not a good idea either.

In object-oriented programming, a God object is an object that knows too much or does too much. The God object is an example of an anti-pattern.
https://en.wikipedia.org/wiki/God_object

   
What’s left is that you put your record definition in a header file and include it in multiple modules, so you can separate your interface and non-interface functions. Might be viable.
    
But what if you want to implement multiple interfaces for the same record?
Then you will end up with your functions operating on the same record reaching through multiple modules coupled with functions operating on other records. The more records, interfaces and functions you have the bigger mess you will have. And it’s usually you, who will have to maintain it. Or worse, the guy next to you who moonlights in his spare time as an ax murderer 🙂
    
Also if you still remember what our encapsulation principle was, you can see that this is definitely not.
    
So how can we apply the encapsulation principle and get rid of the mess?
The same way the object-oriented programmers do since the beginning of time. Put the record definition and all functions operating on it in a separate module, and nothing else.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-module(male).
-export([new/2,
         name/1, age/1,
         name/2, age/2
        ]).

-record(male, {name, age}).

%% male constructor
new(Name, Age) ->
    #male{name = Name, age = Age}.

%% name accessor
name(#male{name = Name}) ->
    Name.

%% age accessor
age(#male{age = Age}) ->
    Age.

%% name mutator
name(Name, M) when is_record(M, male) ->
    M#male{name = Name}.

%% age mutator
age(Age, M) when is_integer(Age), is_record(M, male) ->
    M#male{age = Age}.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-module(female).
-export([new/2,
         name/1, age/1,
         name/2, age/2
        ]).

-record(female, {name, birth_year}).

%% female constructor
new(Name, Age) ->
    #female{name = Name, birth_year = age_to_birth_year(Age)}.

%% name accessor
name(#female{name = Name}) ->
    Name.

%% age accessor
age(#female{birth_year = BirthYear}) ->
    birth_year_to_age(BirthYear).

%% name mutator
name(Name, F) when is_record(F, female) ->
    F#female{name = Name}.

%% age mutator
age(Age, F) when is_integer(Age), is_record(F, female) ->
    F#female{birth_year = age_to_birth_year(Age)}.

And with this the person module is emptied out.
One thing I need to mention is that the person:new/3 function was separated into male:new/2 and female:new/2 so instead of person:new(alice, female, 30) you just simply use female:new(alice, 30).

Why we needed this?
Because we moved the record definition, and as you know record definitions are private. And with that the new function is no longer part of our interface, which is a good thing. In real life your records won’t be so similar that you will be able to use an uniformed constructor function, just like you don’t share constructors in object-oriented languages either.

So now we have fulfilled our encapsulation principle. Everything is neat and pretty. How the heck are we supposed to use this?
    
The problem we are facing is that if we have a P record we need to know if its a male or a female record because otherwise we can’t call the appropriate functions. Keeping track of all the different record types is a hassle, not to mention what happens when you have a (lets call it People) list which can contain both male and female records? You can’t keep track of those. Also with encapsulating the records in their own module we don’t have anything center (like a person module) that can tell us if a record is male or female type.
    
Can the record itself help us out somehow?
YES! With…

Tuple Calls

You still remember tuples? Good.
There is a neat, not-very-documented (i.e. undocumented) feature regarding tuples.

If you have a tuple where the first element is an atom, you can do the following “magic”:

1
2
T = {dict, ...}
T:size()

Look closely on the 2nd line. T:size() which is {dict, ...}:size() if you substitute the variable.

This is called a tuple call and you are basically using your data structure to make a function call. The first element becomes the module name (this is why it needs to be an atom) and the tuple itself becomes the LAST argument of your function call.
    
So in this example T:size() will be dict:size(T).
If you have a function with more parameters like T:is_key(K) then it translates to dict:is_key(K, T).
    
How can we use this?
Remember what a record is?
Exactly, a tuple where the first element is the record name (which is an atom).
    
As we have the module name the same as the record name, furthermore in every accessor and mutator function the record itself is the last element (no, it’s not a coincidence) we have everything we need to do the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
2> A = female:new(alice, 30).
#female{name = alice,birth_year = 1988}

3> A:name().
alice

4> A:age().
30

5> A:name(barbara).
#female{name = barbara,birth_year = 1988}

6> A:age(21).
#female{name = alice,birth_year = 1997}

Of course this works with the male record just the same, but instead of those same examples lets do something more exciting.
Lets write a function which calculates the average age of a list of people (male and female records). We will use the person module for this, it is empty anyway.

1
2
3
4
5
-module(person).
-export([avg_age/1]).

avg_age(People) when is_list(People) ->
    lists:foldl(fun(P, Acc) -> Acc + P:age() end, 0, People) / length(People).

Yes, this simple.
I will include the offical documentation of the lists:foldl/3 function here for those who are not familiar with it.

foldl(Fun, Acc0, List) -> Acc1

    Fun = fun((Elem :: T, AccIn) -> AccOut)
    Acc0 = Acc1 = AccIn = AccOut = term()
    List = [T]
    T = term()


Calls Fun(Elem, AccIn) on successive elements A of List, starting with AccIn == Acc0.
Fun/2 must return a new accumulator, which is passed to the next call.
The function returns the final value of the accumulator. Acc0 is returned if the list is empty.
http://erlang.org/doc/man/lists.html#foldl-3

More plainly: it iterates through the list (from left to right) and calls a function with an element and an accumulator as parameters. What the function returns will be the accumulator for the next element. It returns the last accumulator when it reaches the end of the list.
What we end up is effectively a fancy sum function, and then we divide what it returns with the length of the list to get the average.

Let’s test it:

3> A = female:new(alice, 20).
#female{name = alice,birth_year = 1998}

4> B = male:new(bob, 30).
#male{name = bob,age = 30}

5> C = female:new(carol, 40).
#female{name = carol,birth_year = 1978}

6> D = male:new(dan, 50).
#male{name = dan,age = 50}

7> People = [A, B, C, D].
[#female{name = alice,birth_year = 1998},
 #male{name = bob,age = 30},
 #female{name = carol,birth_year = 1978},
 #male{name = dan,age = 50}]

8> person:avg_age(People).
35.0

Erlang record and tuple call interface in action - TupleCaller

Works like a charm!
And with that ladies and gentlemen we fulfilled our polymorphism principle!

Afterwords aka The Rant

I left the bad news for last. With the latest release of Erlang (R21) tuple calls are opt-in. What that means is you have to use a compiler flag to make them usable.
I also found discussions about removing it altogether. There is a lot of hate for this particular feature, and I haven’t found any good reason why.

What I found is the following “reasons”:

1. It’s weird. It’s confusing.
This is programming. The world of logic. There is no place for emotions. It’s just a tool. You don’t get emotional about a hammer either.

1.5 It’s confusing for newbies.
This is very similar to how member functions are called in an object-oriented language. As they are one of the most popular languages, we can safely assume that most people will have some kind of OOP background so this should be very familiar to them. On the other hand immutable variables, pattern matching and tail recursion will be much more confusing because they won’t have any point of reference for them.

2. It’s not functional.
And? Who cares? If non-functional languages can use functional elements to enhance their language why can’t we?

3. Too much maintenance overhead for the Erlang team.
That’s actually the only valid argument I’ve managed to find. But not without numbers. Can you provide them? How much time did the team spend on maintaining this feature in each release cycle? If you have the answer than we can argue if it’s worth the cost or not. If not, than it’s just a statement without any proof.

3. Nobody use it.
Actually they do. That’s how I learned about them. By checking out the code of a very popular http library, and wondering how come I’ve never heard of this.

3.5. Only people who can’t bear to part with their OOP background use it.
There isn’t much difference between T:is_key(K) than dict:is_key(K, T).

The difference comes when you have to do something a little more complex. Then it does matter if you can have a cleaner design and later better code maintenance as a consequence.

4. I have to keep track of which variables I can use for tuple calls.
Erlang has dynamic typing. You have too keep track all of your variables’ type. That’s why you have guards, to make sure your functions are called with the proper types. You have built-in tools to specify your function signatures.
Even without tuple calls, if you are not using these tools you are doing it wrong. What’s the difference between calling a function with parameters of the improper type or trying a tuple call on something that’s not a tuple?

I don’t really care if they don’t like it, or don’t want to use it, but maybe have a better reason than personal opinion without any proof when campaigning to remove something that’s actually useful.

I am TupleCaller.
And this is my design.