CERBERUS: A Three-Headed Decoder for Vertical Cloud Profiles
A new AI model reconstructs 3D cloud structure from 2D satellite images — potentially sharpening climate models that have struggled with cloud uncertainty for decades.

The Thesis
Weather and climate forecasting has a persistent blind spot: satellites mostly see the tops of clouds, not their full vertical structure, yet that structure drives rainfall, radiation, and storm behavior. CERBERUS is a machine learning framework that infers vertical radar reflectivity profiles — essentially, a slice through a cloud column showing where rain and ice exist at different altitudes — using only the kinds of inputs that are widely available globally: geostationary satellite brightness temperatures, near-surface weather station data, and the time of day. The model uses a 'three-headed' neural network decoder to predict not just a single best-guess cloud profile but a full probability distribution over possible profiles, including explicit handling of the common case where no cloud or radar echo is present at a given altitude (called zero-inflation). This matters because cloud uncertainty is widely recognized as one of the largest sources of error in climate projections. The catch: CERBERUS is trained and tested at a single research site in Oklahoma, so whether it generalizes to, say, tropical oceans or mountainous terrain remains unproven.
Catalyst
The ARM (Atmospheric Radiation Measurement) program has spent decades building high-quality ground-based radar datasets at fixed sites, and geostationary satellites like GOES-16 now provide high-cadence, high-resolution brightness temperature imagery that didn't exist at the same quality a decade ago. Advances in probabilistic deep learning — specifically architectures that output full distributions rather than point estimates — have matured enough to handle the inherently ambiguous mapping from 2D satellite observations to 3D cloud structure. These two threads converging makes this approach tractable now in a way it wasn't even five years ago.
What's New
Prior approaches to cloud profile retrieval fell into two camps: physical retrieval algorithms (which use radiative transfer equations to invert satellite signals, but are computationally expensive and assumption-heavy) and simpler statistical methods (which typically predict mean profiles without uncertainty). Recent deep learning efforts have tried direct regression from satellite inputs to radar variables, but these produce deterministic outputs that can't represent the genuine ambiguity when multiple cloud states are consistent with the same satellite observation. CERBERUS instead frames the problem as probabilistic inference, outputting a distribution over possible profiles using a zero-inflated model that separately handles the probability of cloud absence versus cloud intensity — a physically motivated design that prior neural approaches largely skipped.
The Counter
CERBERUS is trained and evaluated at a single location — the ARM Southern Great Plains site in Oklahoma — which is an unusually well-instrumented, flat, continental environment. The paper does not show whether the model works over tropical convection, ocean boundary layer clouds, or complex orographic (mountain-influenced) cloud systems, which are precisely the regimes that matter most for global climate uncertainty. Predicting uncertainty estimates is only valuable if those estimates are well-calibrated — that is, if the model says it's 80% confident, it should be right 80% of the time — and the paper's calibration analysis is limited. The mapping from 2D satellite brightness temperatures to 3D cloud structure is genuinely ill-posed (many different cloud columns can produce the same top-of-atmosphere signal), and a probabilistic model that spreads probability mass widely could score well on metrics while providing little operational guidance. Finally, the path from a research-site proof-of-concept to assimilation into operational numerical weather prediction models involves years of validation, regulatory trust-building, and software engineering that the paper does not address.
Longs
- SPIRE — satellite weather data and analytics company with direct exposure to atmospheric sensing
- BGSF / DTN (private) — agricultural and energy weather intelligence consumers of improved cloud forecasts
- LDOS (Leidos) — defense and government atmospheric science contracts
- VIAV (Viasat) — satellite operators whose network planning depends on cloud and rain attenuation forecasts
- BOTZ (robotics/AI ETF) — indirect exposure to AI-for-science infrastructure
Shorts
- Traditional physical retrieval algorithm vendors (e.g., teams maintaining MERRA-2 or MODIS cloud products) — if learned retrievals prove faster and comparably accurate, the slow physics-based pipeline loses relevance
- CloudSat mission successors — if 2D geostationary inputs can approximate what an active radar satellite provides, the scientific justification for expensive radar-in-orbit missions weakens at the margin
Enablers (Picks & Shovels)
- ARM (Atmospheric Radiation Measurement) program — the ground-based Ka-band radar dataset CERBERUS is trained on
- GOES-16/17 satellite series (NOAA) — the geostationary brightness temperature inputs the model uses
- PyTorch probabilistic modeling ecosystem — zero-inflated distribution layers used in the decoder
- ERA5 reanalysis dataset (ECMWF) — likely source for near-surface meteorological context variables
- DOE Office of Science computing infrastructure — training and evaluation environment for ARM-based ML research
Private Watchlist
- The Weather Company (IBM spinout, private) — cloud data assimilation pipelines
- Atmos Financial (private) — climate risk quantification dependent on better cloud process models
- Tomorrow.io (private) — commercial weather intelligence that could integrate probabilistic cloud retrievals
- Salient Predictions (private) — subseasonal-to-seasonal forecasting where cloud uncertainty is a key driver
Resources
The Paper
Atmospheric clouds exhibit complex three-dimensional structure and microphysical details that are poorly constrained by the predominantly two-dimensional satellite observations available at global scales. This mismatch complicates data-driven learning and evaluation of cloud processes in weather and climate models, contributing to ongoing uncertainty in atmospheric physics. We introduce CERBERUS, a probabilistic inference framework for generating vertical radar reflectivity profiles from geostationary satellite brightness temperatures, near-surface meteorological variables, and temporal context. CERBERUS employs a three-headed encoder-decoder architecture to predict a zero-inflated (ZI) vertically-resolved distribution of radar reflectivity. Trained and evaluated using ground-based Ka-band radar observations at the ARM Southern Great Plains site, CERBERUS recovers coherent structures across cloud regimes, generalizes to withheld test periods, and provides uncertainty estimates that reflect physical ambiguity, particularly in multilayer and dynamically complex clouds. These results demonstrate the value of distribution-based learning targets for bridging observational scales, introducing a path toward model-relevant synthetic observations of clouds.