From days to minutes: Making population estimates faster and smarter

Aerial view of building footprints in a nondescript area

The Office of Data and Innovation (ODI) helped improve how the Department of Finance (Finance) makes small-area population estimates, supporting statewide resource planning.

Project scope

Timeline: 3 months
Team: 2 data engineers

Partner

Department of Finance

Methods

Automation
Data engineering
Data visualization
Demographic data analysis
Geospatial data analysis
Journey mapping
Usability testing
User research

The opportunity

Finance makes small-area population estimates. The state uses them to distribute funds and plan infrastructure and services. A key dataset used for this is building footprints, which are outlines of structures made from aerial images. To make these estimates, analysts had to combine large and changing datasets with complex data for every single update.

The process was time-consuming. It could take 3 days for each estimate. Every update had to be manually processed and didn’t have notes about what footprints changed. All this meant analysts spent valuable time on repetitive data cleaning tasks instead of making the estimates.

How we helped

ODI worked with Finance to design and build a streamlined data pipeline. It automated many tasks that had been done by analysts. These changes cut the time to make an estimate from 3 days to minutes. Finance now has access to faster, more timely estimates, helping support data-driven decisions.

Analysts can quickly get up-to-date data to make estimates as large as the state or as small as a neighborhood. They can now download the data for a smaller area instead of the whole state. Analyzing smaller areas also saves time and computing power. The data pipeline ensures that a community’s needs are met, no matter where Californians call home.

Cut time to make estimates from 3 days to minutes

20-90% reduction in file size downloads

What we built

We made a new dataset that combines building the footprint dataset with Census data. Our pipeline makes an unwieldy dataset into one that can be easily analyzed.

It automatically:

Joins data together
Gets rid of duplicates found by humans
Breaks up data by county
Saves data corrections

The pipeline stores the data in the cloud. This is low-cost, reliable, fast, and available to the public. Analysts can get the data through several ways and formats, allowing them to pull only what they need.

Custom workflows let the data be used with other geographic data. Analysts can easily and permanently remove misidentified footprints. This is a major time- and pain-saver for Finance analysts.

We know a good algorithm for urban areas might not work well for rural areas. Now analysts can account for regional variations more seamlessly.

Modern Data Stack Accelerator

The Modern Data Stack Accelerator (MDSA) service improves departments’ data infrastructure. The MDSA trains departments in a new way of working with data called analytics engineering. All MDSA projects involve data challenges that impact operations. MDSA projects:

Move data workflows to the cloud
Emphasize testing, documentation, and continuous development
Train the trainer so departments can do it themselves going forward

Learn more about ODI’s work on this project in our technical paper.