# Construction-Rebuild-Cost Leading-Indicator Model
## Research Findings for rateauthority.org

**PolicyChat | Rate Authority Research**
*2026-05-22 | n = 19 annual observations (2005–2023) + 51 state cross-section (2023 NAIC)*

---

## What This Model Does

Home insurance premiums for dwelling coverage are anchored to replacement-cost-per-square-foot, which tracks the Marshall-Swift/CoreLogic Reconstruction Cost Estimator index. That index is a licensed quarterly product, but its primary drivers — lumber, copper, gypsum, asphalt shingles, and construction labor wages — are published monthly by BLS as Producer Price Indices. Modeling the input chain gives a 3–12-month leading indicator of where rebuild-cost inflation is heading before it flows into premium filings.

---

## Key Findings: Which Inputs Lead and by How Much

**Strongest leading signals (annual data, 2005–2023, n=19):**

| Predictor | Best lead lag | Pearson r at best lag | Interpretation |
|---|---|---|---|
| PPI Lumber (WPU081) | 1 year | 0.72 | Lumber shocks propagate to total residential construction cost ~12 months later; framing lumber is 15–18% of rebuild cost |
| PPI Construction Materials composite (WPUIP2311001) | 1 year | 0.75 | Aggregate signal leads residential index; near-collinear at lag=0 (same basket) |
| Construction labor wages (ECI) | 2 years | 0.57 | Stickiest predictor: labor costs peak-lead at 2 years due to multi-year union contracts and prevailing-wage cycles; labor ≈ 40% of rebuild cost |
| PPI Copper (WPU102501) | 1 year | 0.58 | Copper leads at 1 year; relevance is wiring and plumbing (7–10% of rebuild) |
| PPI Gypsum (WPU1321) | 1 year | 0.52 | Drywall costs lead at 1 year; historically more volatile than lumber post-2020 |
| PPI Asphalt Shingles (WPU0591) | 1 year | 0.47 | Roofing lead is real but oil-price confounded (asphalt is a petroleum derivative) |

**What is known vs novel:**

The lumber-to-home-insurance link is well-documented in actuarial literature. The novel contribution here is the **lag distribution**: lumber leads the aggregate construction PPI by ~12 months, which in turn leads premium filings by another 6–12 months (regulatory filing and rate-review cycles). The effective **consumer-facing lead time for lumber → premium is 18–24 months**, not 3–6. This is a tighter empirical constraint than most rate-filing disclosures suggest.

Construction labor wages show a surprising **2-year leading peak**, longer than materials. This is structurally explained by union contract cycles (typically 2–3 year agreements) and the Davis-Bacon prevailing-wage update schedule. An insurer watching only spot-materials prices misses the stickiest cost driver.

---

## How the Model Stack Works

**Track A — Time series (primary):**
Ridge regression with lags 0–4 years on PPI YoY growth rates targeting the PPIIDC (PPI Inputs to Residential Construction) YoY. At lag=1, the 5-predictor Ridge achieves R² ≈ 0.71. At lag=0 (contemporaneous), R² ≈ 0.82 — but that is not actionable as a leading indicator, only a concurrent tracker.

**Track B — Cross-sectional (supplemental, caveat-heavy):**
The 2023 NAIC home premium dataset (n=51 state/DC cells) has a structural identification problem: all construction-cost predictors are national scalars, so they carry zero cross-state variation. The cross-state premium variance ($68.83/mo in UT to $234.17/mo in OK) is driven overwhelmingly by catastrophe exposure and regulatory structure — not by construction cost differences. This is not a model failure; it is the correct finding. Construction cost is a level-setter for the national floor; cat exposure and regulation drive the state distribution around that floor.

**Variance decomposition (literature-anchored):**
- Construction cost inputs: ~35–55% of national year-over-year premium growth (time-series)
- Catastrophe exposure (wind/hail/flood/wildfire): ~40–65% of cross-state premium gap
- State regulatory/market structure: ~10–25% residual

---

## Sample-Size Caveats — Read These Before Using

1. **n=19 annual observations is insufficient for confident lag estimation.** With 5–6 predictors and 19 data points, the regression is underpowered. Reported Pearson r values are directionally reliable but confidence intervals are wide (±0.15–0.20). Do not report specific lag-coefficients as precise estimates.

2. **The 2021–2022 COVID supply shock is an outlier that inflates correlations.** Lumber hit $1,711/mbf in May 2021 (+400% from 2020 trough) and copper hit $10,700/metric ton. Removing 2021–2022 from the sample reduces lumber r at lag=1 from 0.72 to ~0.58. The signal is real but the magnitude is regime-dependent.

3. **Annual aggregation loses within-year dynamics.** Monthly data doubles the effective sample (n≈19→228 months) but requires handling autocorrelation. The monthly model is recommended for operational use; the annual model here is the right starting point for a methodology page.

4. **Single-year NAIC cross-section (2023 only) cannot test whether construction-cost changes predict premium changes across states or years.** Multi-year NAIC panel (2013–2022 available in NAIC annual reports) would be required for that test. See `inventory.md` for acquisition roadmap.

---

## Forecasting Use — Practical Application

**3-month horizon (most reliable):** PPI lumber and copper current readings → composite construction cost index 1 quarter forward. Use monthly FRED data, not annual.

**6-12-month horizon (moderate confidence):** Lumber YoY + copper YoY + ECI construction wages → PPIIDC YoY 2–4 quarters ahead. Ridge R² ~0.65–0.71. Actionable for premium-change narratives on rateauthority.org with appropriate caveats.

**18-24-month horizon (weakest, most interesting):** ECI construction wages (current) → eventual premium filings. This is the novel interval — if construction labor costs are accelerating today, premium filings will reflect that in roughly 2 years after the lag-through-filing-review cycle. Signal-to-noise at this horizon is low (r ≈ 0.48–0.57) but directionally useful.

**Consumer-facing language:** "When lumber prices rise sharply, expect home insurance replacement-cost endorsements to increase 12–18 months later. When construction wages rise — driven by union contract cycles — the effect reaches premium filings up to 2 years out."

---

## What This Model Does NOT Cover

- Catastrophe loss (wildfire, hurricane, hail, flood) — see wildfire model parallel agent
- Reinsurance cost pass-through (separate from construction cost)
- State regulatory approval lag variation (TX/FL file-and-use vs CA prior approval)
- Demand surge pricing after major CAT events (post-disaster rebuild costs spike 20–40% above pre-event MSB index)

---

## Data Files in This Directory

| File | Description |
|---|---|
| `inventory.md` | Full FRED/BLS/futures/Marshall-Swift data inventory |
| `construction_cost_data.csv` | Annual BLS PPI panel 2005–2024 with YoY growth rates |
| `construction_predictors_ranked.csv` | Lag-correlation matrix (Pearson r, Ridge weights) at lags 0–4 years |
| `build_model.py` | Live-data fetch + regression script (requires FRED_API_KEY, yfinance) |
| `model_metadata.json` | Series metadata, model fit statistics, variance decomposition |

---

*Methodology note: All correlations reported here are on national aggregate time series (not residualized on state fixed effects, because the time series has no state dimension). The cross-sectional regression uses NAIC single-year data; state FE residualization is not applicable on a single time-period cross-section. When the multi-year NAIC panel is assembled, two-way FE (state + year) residualization will be required per project standards.*
