Project Estimation with the Monte Carlo Method
What is the best way to set a deadline? I’d like to propose a method based on confidence intervals and simulations.
If you want to go directly to the real sh*t, go to the notebook I developed here.
Project management and estimation.
One of the hardest things to do in software engineering is to estimate the time that a project takes to complete. I would argue that the best estimation is no estimation at all. Yet, sometimes there are deadlines to accomplish, and the team is forced to commit to them.
What is the best way to set a deadline or understand what can be done by a given deadline?
I’d like to propose a method based on confidence intervals and simulations instead of the common way of estimating tasks and adding them up. When developers estimate a task, they assign a probability, and adding them does not tell the whole picture.
It’s like watching the sky and guessing whether it’s going to rain or not. Sometimes it’s easy to estimate; others not so much. Yet, we all understand that our estimation of rain is a probability. If we say “it’s going to rain,” we are really saying: “It’s more likely that it rains than it doesn't.”
We should consider task estimations the same way. It doesn’t matter whether the estimation is in days, sprint points, or hours. It’s just a probability.
I am not the only one to think this. Just google “project estimation with Monte Carlo.” Or visit this article or this other one. Those are the ones that inspired me.
Monte Carlo Method
The Monte Carlo method is a way to compute the probability of a scenario by simulating it many times instead of calculating the probability with some formula.
For example, finding out the probability of throwing two dice and getting the combination 1-1. With Monte Carlo, we develop a program that throws two dice, and we execute this program many times while storing the result. Let’s say that by the end of a hundred executions, the combination 1-1 appeared ten times. Therefore, according to the Monte Carlo simulation, the probability of getting 1-1 is 10/100, or 10%.
The more simulations we run, the more accurate the result.
Project estimation with the Monte Carlo method
Estimating the time needed to finish a project is similar to throwing dice. Yet, the program is slightly more complicated.
With the dice, we had to create a program that randomly selected a number among the possible outcomes: 1, 2, 3, 4, 5, and 6.
Since we had two dice, we had to do this twice.
In project estimation, we want to get an actual time for each task of the project. For example, a task estimated by the developer in 3 days might be 2 days, or 2.5 days, or even 4 days. The idea is to get a number close to the estimation done by the developer. In the bonus section, I explain how I get this number.
This number we get per task is considered the actual time in that specific simulation. We then need to get a number for each task, and then we have the total time spent for that simulation.
Let’s take a look at a simple project with four tasks. The tasks are to render a list, edit, delete and create a user. The developers estimate each task, and then we run four simulations.
Project tasks Estimation Simulation 1 Simulation 2 Simulation 3 Simulation 4 Render users list 3 days 2 2.5 4 4 Edit profile 5 days 5 5 4.5 7 Delete user 1 day 0.5 1.5 1 2 Create user 4 days 3 3.5 4.5 5 Total Project 13 days 10.5 12.5 14 18
The sum of the estimation is 13 days (second column). The average of the simulations is 13.75 days (from the third until the last column). Two of the simulations were longer, and two were shorter than the total estimated time. According to the simulations, half the time we finish the project later than the estimation.
If we set the deadline considering the estimation of thirteen days, then we have a 50% chance of not meeting the deadline. If we set the deadline at fourteen days, we have a 75% chance to meet it because three out of four simulations took fourteen days or less.
Setting a deadline is always risky business. We know that there is a possibility that we won’t make it. With this method, we can see how much we are risking.
If we want to be conservative, we put a deadline where 80% of the simulations finish on time. If our deadline is too tight and only 40% of the simulations finish on time, we might want to take extra actions or remove functionalities to reduce the number of tasks.
If you are ready to give it a try, go to the notebook.
The instructions are in the notebook. In a nutshell, upload a CSV with the list of tasks with one column: “estimation,” which is the developer’s estimation in days, sprint points, etc.
The results appear at the bottom of the notebook.
“Sum of estimations” is the sum of the column “estimation”, the sum of the developer input.
“50% Percentile” is the confidence level of finishing in less than that estimation the 50% of the time. In the previous example, 50% of the time, the project finishes in less than 43.72 days.
“80% Percentile” is the same as the “50% Percentile” but for 80% of the time.
If you like the idea, do not hesitate to use it and let me know how it goes. I am very interested in learning how different teams manage estimations and deadlines.
In the future, if this is something people want, I could build a web application that non-developers can use to estimate their projects with the Monte Carlo method.
Why did I develop this
First of all, because I can; I am a developer, and it’s important to understand how our capabilities are not bound to the development of the project you are building.
I know a little bit of statistics and math, and I have been in management positions where I had to do estimations even though I was not comfortable with them. This is my desired way to estimate and set deadlines.
You might be into health and productivity or process automation. Use your coding skills to build something useful for you and your team. Not everything we develop needs to be public or needs to be ready for millions of users. Nor does it need to be ready for all edge cases.
If you have already built something like this, I’d love to hear from you. It could be interesting to write an article putting these examples together and share the results.
BONUS: How the simulation works
Here is where I get technical on how the simulation is built. I added comments in the code of the notebook as well. In case you want to check that out.
The following is the code to execute one simulation:
# Sample a time for each task and add times together. def sample_total_time(tasks): total_time = 0 for size, std_text in tasks: std = std_dict.get(std_text, 1) # Sample a blow-up factor blow_up = blow_up_factor(std) # "blow-up factor" * "estimated" -> "actual" total_time += blow_up * size return total_time
I iterate through each task and in each task, I get a blow-up factor. This
blow_up is used as a multiplier to the estimation of the task. A blow-up factor of one means that the estimation is the actual duration. A blow-up factor lower than one means that the actual time is less than the estimated. With a factor larger than one, the actual time is more than the estimated.
Statistics appear in the sampling of this blow-up factor. I used the analysis done in this article.
There are two important findings in the article above.
Developers estimate the median
One finding is that developers are good at estimating the median of the duration, but the medium is not the mean. The median is the value separating the higher half from the lower one. In contrast, the average is the sum of all elements divided by the number of elements.
For example, in a list of 4.5, 5, 5, 7. The median is 5, yet the average is 5.375.
Because tasks are bound in the lower durations, a task cannot be done in minus one day. That means that the higher half always weighs more, making the average higher than the median.
That means that the median of the blow-up factor is 1, but not the average.
The other finding is that the log-normal distribution is the correct distribution to predict estimations.
I use the log-normal distribution to sample the “blow_up_factor.” The standard deviation changes based on the complexity of the task: “low” complexity has a lower deviation than “high.”
The log-normal distribution fits very well with human behavior. From wikipedia:
“In reliability analysis, the log-normal distribution is often used to model times to repair a maintainable system.” Wikipedia
The tricky part of this Monte Carlo simulation was finding the correct probability distribution and parameters to sample the task durations. Once we have the way to sample what we are simulating, Monte Carlo simulations give plenty of useful insights to take decisions.
Don’t hesitate to check out the code in the notebook and reach out to me if you have any questions.
If you like this post, consider sharing it with your friends on twitter or forwarding this email to them 🙈
Don't hesitate to reach out to me if you have any questions or see an error. I highly appreciate it.
And thanks to Sebastià for reviewing this article 🙏