To understand why "0.1 + 0.2" is not "0.3", we will create a new Floating-point format. One much simpler than the Double-precision; yet, enough to understand the problem that lies at the bottom of this statement. This new format we are creating is:
The Almost-No-Precision Floating-point format.
With the new format, we will perform the addition "0.1 + 0.2" and compare it to the representation of 0.3.
Reminder: Floating-Point Format
The Floating-point representation is based on the normalized scientific notation.
4.9 x 1011
The Floating-point representation converts the number to the scientific notation first and then stores the parts of that notation: the exponent, the sign (positive or negative), and the number.
All the numbers are converted to binary before being stored in memory. Therefore, the number stored looks more something like this:
1.01011 x 21011
Let's take a look at how the Double-precision Floating-point format.
(-1) * sign * 1.f * 2e - 1023
- "sign" is a bit that represents whether the number is negative or positive.
- "e" is the exponent with a bias—no need to go into the details for now.
- "f" is the fraction (or mantissa): the content of the number. The format of Double-precision defines fifty-two bits for this part.
The Double-precision floating-point stores fifty-two bits for the mantissa, a sign bit, and eleven bits for the exponent. But, we can simplify the format if we agree to store only numbers smaller than two and greater than zero.
That means that we can skip storing the exponent and sign bits. Regarding the number of digits we want to store, let's agree to five bits for the fractional part and one for the integer part.
That's all we need for our Almost-No-Precision Floating-point representation.
0.1 + 0.2 in Almost-no-Precision Floating-Point
Do not forget our goal: we want to add 0.1 and 0.2.
First, we need to convert 0.1 and 0.2 to the format Almost-no-Precision and this format, like the Double-precision, also stores binary numbers.
Converting to Almost-no-Precision Floating-Point
To convert a decimal number to binary, we convert the integer part and the fractional part separately.
Both integer parts are 0.
For the fractional part, there are plenty of good tutorials on the internet. Yet, here is the gist of a simple way to do it
- Multiply the fractional part by two.
- For example: “0.1 * 2 = 0.2”
- Keep track of the integer of the result: "0"
- If the result is less than one, go back to step 1.
- If the result is more than one, subtract one and go back to step 1.
- If the result is zero, we have finished. Go to step 5.
- The fractional part in binary is the integers of the result of step 1
- For example: If the integers were "0 1 0", the fractional part in binary would be: ".010"
Let's do a simple example converting "0.25" to binary.
Almost-no-Precision Floating-Point: 0.1
If we calculate it for 0.1, we have the following:
The result of converting 0.1 to binary is a number with recurring digits (0.00011001100...). However, in our representation, we can only store five digits.
Almost-no-Precision Floating-Point: 0.2
Trust me when I tell you that 0.2 in binary is 0.00110011… It also has a recurring pattern of digits. Or don't trust me and calculate it ;)
Addition of Almost-no-Precision Floating-Point
We now have our numbers stored in memory. So, let's add those two numbers, but remember that our computer works with the stored binary data, not real numbers.
That means that the computer performs a binary addition of 0.00011 and 0.00110.
After adding both numbers, the result is 0.01001. Therefore that is what gets stored in memory as the result of the addition of 0.1 and 0.2.
0.1 + 0.2 compared to 0.3 in Almost-no-Precision Floating-Point
Is this correct? Is this how 0.3 is represented in the Almost-No-Precision Floating-point format?
0.3 in binary is 0.0100110011… Also, with a recurring pattern. So, how do we store it in our memory?
In this case, the first digit that is not stored, the sixth, is 1. That means we need to round up the last digit, which is also one, and propagate the rounding to the other digits until we find a zero.
The representation of 0.3 in our memory is 0.01010, while the addition was 0.01001.
That means that with Almost-no-Precision Floating-point, we also have that “0.1 + 0.2 != 0.3”.
Where is the problem?
The problem is in trying to represent infinite information in limited amounts of space. If we had infinite memories, we wouldn't lose any precision.
Unfortunately, our computers are finite. Computer scientists dealt with the limitations of computers the best they could. The Floating-point Representation is effective but not perfect.
Back to real life
Now that we have seen simple examples, you might want to check the real thing. I recommend you this great step-by-step calculator on Floating-point addition with both single or double-precision.
In this website/calculator, you can see how the limitations we learned in our simple example also apply in the complex format of Double-precision Floating-point.
Thanks for reading, don't be a stranger 👋
GIMTEC is the newsletter I wish I had earlier in my software engineering career.
Every other Wednesday, I share an article on a topic that you won't learn at work.
Join more than 3,000 subscribers below.