# Wildfire Risk Leading-Indicator Model — Findings
*Generated: 2026-05-22 | Scope: CA, OR, WA, CO, NM insured wildfire losses 1993-2023*
*Models: (A) CA-only HistGBM regression (n=31) + (B) Multi-state binary classifier (n=155)*
*All Spearman correlations residualized on year per standing rule*

---

## Sample Size Caveat — Read First

Western US usable insured wildfire loss history begins approximately 1990. The effective
sample is severely constrained: **CA has only ~9 genuinely event-year observations**
(losses above the $150M baseline) in 31 years; the remaining 22 years are at the $50M-$150M
floor. This floor compression means individual-feature Spearman correlations against the CA
loss target are statistically indistinguishable from noise at n=31 — all p-values > 0.34.

**This model should be read as hypothesis-generating infrastructure, not a validated predictor.**

The multi-state binary classifier (Track B) is the more honest metric: it defines
"high-fire-year" as above the 75th percentile of insured losses *within each state*,
removing the cross-state scale difference. CV AUC = **0.658 ± 0.092** across 5 folds
at n=155 — meaningfully above the 25% base rate, but noisy. Confidence interval
spans approximately AUC 0.57-0.75.

---

## Actual Model Results

### Track A — CA-Only Regression (n=31)
- CV R² raw: -1.48 ± 2.48
- CV R² residualized on year: -1.08 ± 0.65
- **Interpretation:** Negative CV R² means the model generalizes worse than predicting
  the mean. At n=31 with only ~9 real event years, HistGBM cannot reliably learn
  from this data. The feature importance rankings from this track reflect in-sample
  fit patterns, not out-of-sample signal.

### Track B — Multi-State Binary Classifier (n=155)
- CV AUC: **0.658 ± 0.092**
- High-fire-year base rate: 15% (25th-pct threshold within state)
- **Interpretation:** AUC 0.658 means the combined signal (prior acres + ENSO lags)
  correctly orders pairs of high vs low fire years 66% of the time — a real but
  modest lift above the base rate. This is the defensible finding.

---

## Feature Importance Rankings (Multi-State Binary, more reliable)

From permutation importance on the Track B binary classifier:

| Rank | Feature | Importance | Notes |
|------|---------|-----------|-------|
| 1 | prior_yr_acres | 0.094 | Prior-year state acres burned |
| 2 | prior2yr_acres | 0.058 | 2-year lag acres burned |
| 3 | oni_lag6m | 0.058 | ONI average Jan-Mar (6m before fire season) |
| 4 | mam_oni | 0.058 | Spring ONI (Mar-May) |
| 5 | oni_lag3m | 0.050 | ONI average Apr-Jun (3m before fire season) |
| 6 | state_nm | 0.039 | NM state indicator (high fire share) |
| 7 | oni_lag9m | 0.017 | ONI average Oct-Dec prior year (9m lag) |

The three ENSO lag features (ranks 3-5 and 7) together carry comparable importance
to prior acres burned. This multi-feature ENSO signal is more stable than any
single ONI window.

---

## Top 3 Predictors With Lead Times

### 1. Prior-Year Acres Burned (lead time: 12 months)
Strongest single predictor in both tracks. Mechanism: years with high burn acreage
deplete ground-level fuels and can reduce risk in immediately following years; years
with low burn accumulate fuels. The relationship with *insured loss* is noisier than
with *acres burned* because loss depends additionally on where fires occur relative to
the wildland-urban interface (WUI). Lead time: full prior calendar year.
**Classification: REDISCOVERY** — fuel accumulation predicting fire severity is
well-established fire ecology (Dennison et al. 2014; Miller et al. 2009).

### 2. Spring ENSO / ONI Lags at 3-6 Months (lead time: 3-6 months)
The ONI at 3m lag (Apr-Jun average) and 6m lag (Jan-Mar average) both rank in the
top 5 features in the binary classifier, with nearly equal importance (~0.05-0.06).
La Nina conditions (ONI < -0.5) in the pre-fire-season months suppress Pacific storm
tracks into CA, OR, WA, and CO, leaving vegetation drier entering fire season.
Lead time: **3-6 months** before peak fire season (July-October).
**Classification: REDISCOVERY** — La Nina to western US drought to elevated fire risk
is decades-known (NOAA CPC; Gershunov and Barnett 1998).

### 3. 9-Month ENSO Lag / Prior-Year ONI (lead time: 9-12 months)
The prior-year annual average ONI and the Oct-Dec lag (9-month window) carry
lower but nonzero importance. This captures the multi-season drought persistence
effect: La Nina conditions in the prior fall/winter compound with spring dryness
to produce multi-year drought sequences. The 2020-2022 and 2011-2012 La Nina
sequences both preceded above-normal CA fire loss years.
Lead time: **9-12 months**.
**Classification: PARTIALLY NOVEL in insurance context** — individual La Nina to fire
is known; the compound multi-lag signal specifically mapped to *insured loss* at
the state level with this lag structure is less documented in actuarial literature.

---

## What La Nina to Western US Drought to Fire Actually Looks Like in This Data

Key CA event years in the insured loss record and their ENSO context:

| Year | Insured Loss ($M) | Notable Events | DJF ONI | Prior-yr Acres |
|------|------------------|----------------|---------|---------------|
| 2017 | 13,400 | Thomas Fire, Tubbs, Nuns | -0.22 (near-neutral) | 358k |
| 2018 | 12,500 | Camp Fire ($10.5B) | -0.78 (La Nina) | 652k |
| 2020 | 5,000 | Creek, CZU, LNU | 0.64 (El Nino) | 303k |
| 2021 | 3,300 | Dixie, Caldor | -0.93 (La Nina) | 658k |

**Important nuance visible in this table:** 2017 and 2020 were *not* La Nina years,
yet produced extreme losses. 2020 in particular (El Nino DJF) was followed by the
second-largest CA fire season. This means ENSO alone is insufficient — WUI proximity,
wind events (Diablo/Santa Ana), and land management interact with drought signal.
The ENSO signal adds real but modest lift (AUC ~0.66), not a reliable alarm.

---

## How to Use for Insurance Rate Forecasting

**Proposed 3-9 month horizon protocol for rateauthority.org:**

1. **October-December:** Check ONI (NOAA CPC monthly release). La Nina (ONI < -0.5)
   active? Flag elevated wildfire risk for the following July-October fire season.
   Lead time: ~9 months. Confidence: low-moderate (AUC ~0.66 population-level;
   individual state signal weaker).

2. **January-March:** Confirm DJF ONI. Cross-validate against NRCS SNOTEL
   snowpack percentile for CA Sierra Nevada / Cascades / CO Rockies.
   Combined La Nina DJF + below-50th-pct snowpack = stronger compound signal.
   Lead time: ~6 months.

3. **April-June:** Spring ONI (MAM) + finalized prior-year NIFC acres (released
   ~February). Compose: La Nina spring + above-median prior acres = elevated flag.
   Lead time: 1-3 months to fire season. Most actionable for consumer communication.

**Rate-filing framing:** CA Prop 103 requires 60-day notice for rate changes.
A 6-9 month ENSO-based indicator would allow carriers to pre-position rate filings
in Q1 rather than reactively post-catastrophe. For consumers, the signal answers:
"should I shop now before rates rise, or wait?"

**Honest bound on this signal:** At AUC 0.66, this correctly signals approximately
2 in 3 high-fire-year / low-fire-year pairs. It is a *first-order screen*, not a
reliable annual alarm. It should be presented to consumers as a directional risk
indicator, not a rate forecast.

---

## What the Model Cannot Yet Do

- **County-level predictions:** Requires VHI spatial aggregation + Cal Fire incident
  data with structure-count. State-level is the current resolution ceiling.
- **Carrier non-renewal probability:** Requires CDI rate-filing scrape or NAIC StatBook
  access — see inventory.md for data access paths.
- **Live fuel moisture integration:** USFS WFAS station data provides 0-4 week leading
  signal; useful for operationalizing the spring signal window but requires pipeline build.
- **Point-estimate loss magnitude:** With ~9 real CA event years, magnitude calibration
  is not defensible. Directional above/below-normal is the correct output.

---

## Novel vs Rediscovered Summary

| Signal | Classification | Honest Confidence |
|--------|---------------|-------------------|
| La Nina to western US drought to fire severity | REDISCOVERED | Well-documented; AUC adds ~0.10 lift |
| Prior-year acres burned to next-year loss | REDISCOVERED | Strongest single feature; fuel accumulation established |
| Consecutive / multi-lag ENSO x insured loss | PARTIALLY NOVEL | Less documented in actuarial context specifically |
| 6-9m ENSO signal as rate-filing lead indicator | DIRECTIONALLY NOVEL | Application frame; mechanism is known |
| AUC 0.66 on high-fire-year binary | DIRECTIONAL FINDING | Real but modest; n=155 |

---

## Next Steps to Strengthen the Model

1. **NOAA CDO token:** Pull state-level PDSI monthly 1990-2024 (free registration at
   ncdc.noaa.gov/cdo-web/token). PDSI is a more direct drought measure than ONI proxy.
2. **Cal Fire incident DB:** Scrape fire.ca.gov/incidents for year x county x structures
   destroyed. This gives a better insured-loss proxy than the $50M floor methodology.
3. **VHI western US pipeline:** Extend the FIRMS/VHI pattern from prior West Africa
   cocoa/coffee work to western US states. VHI low-pixel fraction (VHI < 25) in
   April-June is a direct fuel-moisture proxy at 4km resolution.
4. **CDI rate filing scrape:** Non-renewal notices by ZIP code would let us validate
   whether the ENSO signal precedes carrier behavior changes — the ultimate test of
   this as an insurance rate leading indicator.
