Part 1 of a multipart series (Part 2). Imagine you're planning a road trip across the country. You’ve got your maps, your snacks, and your Spotify playlist. But here's the twist: instead of using a GPS that tells you the exact time you'll arrive at your destination, all you have is a Magic 8-Ball, constantly giving you answers like "Outlook not so good" or "Reply hazy, try again."
Sounds chaotic, right? Well, that’s often what forecasting feels like without the right tools. Organizations rely on the right tools to make decisions to address uncertain futures. This blog series focuses on one such tool – Monte Carlo simulations – which can be used to help organizations plan and prepare for unpredictable events. like disasters, where organizations cannot rely on hope or a hunch when it comes to deploying resources. Monte Carlo simulations and similar tools are like the forecasting GPS that cuts through the noise and gives us a range of possible outcomes using cold, hard data to simulate thousands of different scenarios to help organizations make data-driven decisions.
In short, Monte Carlo simulation lets you understand and build data-driven what-if scenarios. Who can use this? If you manage a team, plan financial resources, perform operational or strategic planning, this sort of tool gives insight into an average- and worst-case outcomes in your plan. This blog series shows how Monte Carlo simulation can be used to estimate how many people are deployed each week for disasters using FEMA Open Data.
Monte Carlo Simulations
How does this clever simulation tool actually work? Referring to our road-trip example, picture this: instead of calculating one possible route for your road trip, Monte Carlo generates thousands, even millions. Some of these routes will be smooth sailing—clear highways, no pit stops. Others might involve unexpected detours, flat tires, or long lines at your favorite roadside diner. By modeling all these potential routes, Monte Carlo doesn’t just give you one expected arrival time; it shows you the full spectrum of possibilities, from best-case to worst-case scenarios.
In the data world, Monte Carlo does the same, using probability distributions to account for all the uncertainties—whether that's how long a FEMA deployment might take, how severe a natural disaster could be, or how many resources need to be deployed. Monte Carlo simulation runs hundreds, thousands, possibly millions of potential simulations, varying the path each time, so we can see not just what’s likely to happen, but also the odds of outliers—the equivalent of that sudden blizzard in July.
Part 2 of our blog series will jump into how we can set up simulation. Before then, we need to figure out a few things. What data do we have? Is it useful? What tech tools are we going to use to crunch the numbers? The first step in any good estimation framework is understanding the resources at hand – kind of like picking the right snacks for your road trip.
Where can we start?
One of the great resources we have to look at FEMA deploying folks is found in FEMA Open data (you can find more details on the data in this PDF). The great folks at FEMA catalog a lot of great information.
For our basic start, we have
Event ID - What FEMA categorizes as an "event" or disaster
Deployment ID - a unique ID for each deployment. This is the "granularity" of the data, and matches to a person, deployed for a specific Event.
Event start date - the first date a deployment could be made in support of an event
Date on site - when a person was initially deployed to support an event
Departure date - when a person no longer is at an event
There are a lot of other features included in this data that we'll cover in a future post, but a key idea in modeling is to start simple (because it can always become more complicated!).
This data can help us estimate the total number of people deployed in a given week over a time horizon. Below we see what that might look like in practice, with counts for each deployment cohort for each event from 2016 through 2022, each cohort getting an alternating color. Notice how most are pretty small, but some are quite large! We want to capture that sort of variability better than just taking an average, because most events won't be a huge hurricane.
Figure 1. Count of Weekly Deployed People per Cohort for events starting 2016 to 2022. Cohorts are aligned at the week of event level. Censoring occurs for a few hundreds of individuals that were deployed at data date end, September 2023.
Now that we have our data resources identified, we need a technical stack to build a Monte Carlo model – we don't want to calculate these by hand for thousands or millions of draws! For this simple approach we utilize Google's Colab product, which basic compute scales pretty well for our simple starting point. For more complicated problems you should consider scaling to a GPU, or to a larger platform such as a PySpark platform (e.g Snowflake, Databricks), Kubernetes, or other scaling platforms. Additional libraries are chosen for familiarity to the industry – certainly in the wonderful world of open source data science software many more options exist!
Technology | Version | Why used |
Google Colab | Affordable accessible and scales for small- to mid-sized problems | |
Python | 3.10.2 | Version with Colab. Python chosen as modeling language due to abundance of libraries |
Matplotlib | 3.71 | Basic static plotting |
Seaborn | 0.13.1 | More advanced static plotting |
Scikit-learn | 1.3.2 | Certain statistical metrics |
Scipy | 1.13.1 | Probability entropy transforms |
Numpy | 1.26.4 | Numerical processing and randomization |
Pandas | 2.1.4 | Data file handling, munging, and preparation. |
Table 1. Table detailing needed technologies used to build the Monte Carlo simulation, and why the libraries are chosen.
Where to next?
In the rest of the posts of this series, we’ll build an estimation framework for FEMA's emergency deployments; today, we started with the basics of setting the stage for what we want to predict, and making sure our data is ready for the simulation.
Stay tuned for the next post, where we roll up our sleeves and build our first Monte Carlo model. Spoiler: It’s more exciting than flipping a coin, and much more useful when you’re figuring out how many trucks to send to a disaster zone.
By Tom Roderick, PhD from Flamelit
Comments