Pointers, pointers everywhere
5 min read

Pointers, pointers everywhere

In most high-level languages the variables refer always to references, never to an actual value in memory.
Pointers, pointers everywhere

A pointer is an object that stores a memory address. Yet, instead of storing the value, it stores the location of that value.

When we develop with low-level programming languages like C, we can work directly with pointers. But, if we work with high-level languages, we work indirectly with pointers.

We don’t manipulate or even create pointers on demand, nor manage memory ourselves. Yet, that doesn’t mean we don’t use pointers. They are everywhere.

Pointers vs. references

When I said that developers of high-level languages use pointers, I was not entirely exact. These languages use references rather than pointers. The difference is subtle.

References are immutable signposts; we cannot manipulate or manage them. On the other hand, pointers have a value, and that value can also be used.

Let me show an example of a reference in python:

>>> word = ‘hello’
>>> id(word)
4308007024

id is a built-in function that returns the memory address for the ’hello’ string object. That means that the word variable is a reference to location “4308007024”. Yet, I cannot directly access the memory address “4308007024 + 1” with Python.

Note: Actually, we can, but we need to use a special library that gives extra functionality: ctypes.

Now, let’s take a look at a pointer in C:

#include <stdio.h>
#include <inttypes.h>

int main() {
  
  int  numbers[] = {10, 100, 200};
  // create a pointer variable with “*”
  int *ptr;

  // assign location of `numbers` to `ptr`
  ptr = numbers;

  // print the value inside the variable `ptr`
  printf("Address of numbers: %p\n", ptr);
  // Address of numbers: 0x7ffeeb1b379c
 
  // `ptr` holds a number and I can use it
  printf("Address of third item: %p\n", ptr + 2);
  // Address of third item: 0x7ffeeb1b37a4

  // `ptr` is pointing to the first item
  printf("First item: %d\n", *ptr);
  // First item: 10
  
  // we can access “location ptr” + 2
  printf("Third item: %d\n", *(ptr + 2)); // Third item: 200
  
  return 0;
}

Notice the difference between both addresses (in hexadecimal): “7ffeeb1b379chex” and “7ffeeb1b37a4hex”. If we subtract “7ffeeb1b37a4hex - 7ffeeb1b379chex“, then we get eight.

Locations are referenced using bytes (eight bits); therefore, a difference of eight means eight bytes of memory between one location and the next.

In C, the type int has a size of four bytes. Therefore, the difference between the first address and the second is two integers or eight bytes.

In C, we were able to get the address and read the memory directly starting from the address (ptr + 2). This flexibility is the big difference between pointers and references.

References, references everywhere

Now that we learned about pointers and references, we might want to change the title to “References, references everywhere.” Since most developers work with high-level languages and high-level languages use references.

Not sure whether Woody is happy with this change.

You probably have heard the saying: “everything is an object” in Javascript, Python, or almost any object-oriented programming language. What that is saying, is that everything is a reference to an object.

The variables do not store only the actual value, not even integer types. Instead, they store references to objects that store that value.

Let me give an example in Python.

# create an integer
>>> num = 42
# find location (or reference)
>>> id(num)
4396081168
# num is actually an object
>>> num.__dir__()
['__repr__', '__hash__', '__getattribute__', …]
# __dir__ is a method that lists all the methods of that instance

This idea of the references is very powerful, because now instead of having a simple integer, we have a whole object with plenty of functionality. Additionally, this functionality is not duplicated with each integer; instead, it’s also referenced.

# create another integer
>>> num2 = 14
>>> id(num2)
4531346576
# check the location of a method of num
>>> id(num.__getattribute__)
4398310736
# check the location of a method of num2
>>> id(num2.__getattribute__)
4398310736
# location of method `__getattribute__` is the same for both integers

Methods of different instances point to the same location. This way, when we create a new instance, we are not duplicating everything, just what differentiates the instances.

Variables hold references, not values

When high-level languages say that everything is an object, instead, it means that everything is a reference to a location in memory. So we never have just the value, rather a reference to an object, which in turn has references to shared functionality.

If everything is a reference, couldn’t we change the value inside it? For example, if we create an instance of the class `int` in python with the value 42, couldn’t we change it to 14?

Primitive vs. non-primitive

These languages differentiate between primitive and non-primitive types. That means that even though everything is a reference to an object, some of these objects are read-only. Therefore, they cannot be changed; instead, we get a new one when trying to mutate them.

>>> num = 42
>>> id(num)
4396081168
>>> num += 1
>>> id(num)
4396081200
# different location

As we can see above, the variable num is now pointing to a different location. Whereas when we manipulate a mutable object, this is not the case:

>>> nums = [1, 2]
>>> id(nums)
4398612096
>>> nums.append(3)
>>> nums
[1, 2, 3]
# the object mutated
>>> id(nums)
4398612096
# same location

A List is a mutable object in Python; therefore, it keeps the reference to the same object when changing it.

Mutation or no mutation, that is the question

There are at least two topics, where understanding whether we are mutating or creating a new object is critical.

One of them is performance issues, many times the problem is that the program is creating too many objects which slows down the performance.

The other problem is state, mutation, and immutability; many design patterns and programs rely on shared state (or immutability). Therefore, knowing when an object is created or mutated becomes crucial.

The key question here is always: am I mutating this object or creating a new one?

If you like this post, consider sharing it with your friends on twitter or forwarding this email to them 🙈

Don't hesitate to reach out to me if you have any questions or see an error. I highly appreciate it.

Thanks for reading, don't be a stranger 👋

Thanks for subscribing! A confirmation email has been sent.

Check the SPAM folder if you don't receive it shortly.

Sorry, there was an error 🤫.

Try again and contact me at llorenc[at]gimtec.io if it doesn't work. Thanks!