# Data Inventory — US Insurance Rate Correlation Analysis

Generated: 2026-05-22 07:07

---

## 1. Insurance Rate Data (Y variables)

### 1a. NAIC State Averages 2023 (rate_ledger.db)
- **Available**: Yes — 153 rows (50 states × auto + home + renters)
- **Coverage**: Single year (2023 only). No 2018-2022 NAIC history cached locally.
- **Products**: auto, home, renters
- **Limitation**: Cannot do time-series correlation with single-year data.
  Used only for cross-sectional median polish.
- **Source URL**: https://content.naic.org/cipr_topics/topic_auto_insurance.htm

### 1b. SEC EDGAR Carrier Aggregates (rate_ledger.db)
- **Available**: 14 carriers × 4 quarterly/annual data points each = 56 rows
- **Carriers**: ['AIG', 'ALL', 'CB', 'CINF', 'EG', 'GL', 'HIG', 'KMPR', 'LNC', 'MCY', 'MET', 'PGR', 'TRV', 'WRB']
- **Date coverage**: Most carriers have only 2025-2026 data (4 quarters).
  Allstate (ALL) has 2022, 2023, 2024, 2025 annual — longest available.
- **Limitation**: n=4 per carrier. Insufficient for regression analysis.
  Not used as primary Y. Documented as available for future expansion.

### 1c. State DOI Custom Scraper (rate_ledger.db)
- **Available**: 26 rows, California only, 2024-2026 rate filing dates
- **monthly_premium = 0 for all rows** — records rate filing approval dates,
  not actual premium dollars. NOT usable as Y.

### 1d. FRED CPI Insurance Series (PRIMARY Y — used in analysis)
- **CUSR0000SEHG** (CPI Tenants/Household Insurance (LAGGING target)): n=315, 2000-01-01 – 2026-04-01
- **CPITRNSL** (CPI Transportation (includes auto insurance, LAGGING target)): n=315, 2000-01-01 – 2026-04-01
- **Note**: These are LAGGING indicators (CPI prints 6-12 months after rate filings).
  Predictors that lead CPI insurance = predictors that lead rate filings.

## 2. Predictor Series (X variables)

### 2a. FRED Macro Series — Successfully fetched
- **CUSR0000SETA01** (CPI New Vehicles): n=316
- **CUSR0000SETA02** (CPI Used Cars/Trucks): n=316
- **CUSR0000SETC** (CPI Motor Vehicle Parts/Equipment (proxy for repair costs)): n=315
- **WPU141** (PPI Motor Vehicle Parts (key lead, from substack_robustness)): n=316
- **WPU1412** (PPI Auto Parts & Equipment (subst. code)): n=316
- **CPIMEDSL** (CPI Medical Care): n=315
- **CUSR0000SEHA** (CPI Rent Primary Residence): n=315
- **CUSR0000SAH1** (CPI Shelter / OER): n=315
- **MORTGAGE30US** (Mortgage 30Y Fixed Rate): n=316
- **WPU0811** (PPI Lumber (softwood)): n=316
- **WPU102501** (PPI Copper): n=316
- **CUSR0000SEHB** (CPI Household Furnishings (furniture/appliances)): n=315
- **UNRATE** (Unemployment Rate): n=315
- **BAMLH0A0HYM2** (High Yield Bond Spread (credit stress)): n=35
- **UMCSENT** (Consumer Sentiment (U Michigan)): n=315
- **CSCICP03USM665S** (Consumer Confidence (OECD)): n=289
- **DCOILWTICO** (WTI Crude Oil (general cost proxy)): n=316
- **CPIENGSL** (CPI Energy (catastrophe rebuild cost proxy)): n=315
- **CUSR0000SAF11** (CPI Food at Home (income stress)): n=315
- **LNU01300000** (Labor Force Participation Rate): n=315
- **M2SL** (M2 Money Supply): n=315
- **DGORDER** (Durable Goods Orders): n=315
- **IPMAN** (Industrial Production Manufacturing): n=316
- **CUSR0000SEHF01** (CPI Electricity): n=315

### 2b. FRED Series — Failed / Not Found
- **CUSR0000SEEE02** (CPI Telephone Services): HTTP 400 — series not found or wrong ID
- **WILL5000IND** (Wilshire 5000 (equity wealth)): HTTP 400 — series not found or wrong ID
- **CUSR0000SAA** (CPI Apparel (discretionary spend)): HTTP 400 — series not found or wrong ID

### 2c. Manheim Used Vehicle Value Index
- **Available**: NOT available via FRED API (Manheim is private/Cox Automotive).
  No local cache found. Used CPI Used Cars (CUSR0000SETA02) as proxy.

### 2d. NOAA / NHC Catastrophe Data
- **Available**: FIRMS fire density data cached internally (Rate Authority data warehouse)
  but country-level only, not US-state catastrophe counts.
- **Missing**: State-level hurricane/tornado/hail event counts.
  Would require NOAA NCEI API or SHELDUS. Not pulled in this run.
- **Proxy used**: CPI Energy (CPIENGSL) as indirect catastrophe rebuild cost.

### 2e. Lumber / Copper
- **Lumber**: PPI Softwood Lumber (WPU0811) fetched from FRED
- **Copper**: PPI Copper (WPU102501) fetched from FRED
- **Manheim**: Not available (private index). CPI Used Cars used as proxy.

### 2f. BLS CPI/PPI Subseries
- BLS_API_KEY: commented out in .env (25 free series/day without key)
- All needed series accessible via FRED API instead (FRED mirrors BLS data)

## 3. Data Gaps — Blockers for Deeper Analysis

| Gap | Impact | Workaround Used |
|-----|--------|-----------------|
| NAIC 2018-2022 state data not cached | Cannot build 6yr×50-state panel | Used FRED CPI series as Y |
| Manheim Used Vehicle Index (private) | Missing best auto repair cost predictor | CPI Used Cars proxy |
| NOAA state-level CAT event counts | Cannot test weather → home premium channel | CPI Energy proxy |
| SEC EDGAR n=4 per carrier | Cannot run carrier-level time-series | Noted; not used |
| Litigation environment data | Cannot test tort reform effects | Not available via FRED |
| Telematics/UBI adoption rates | Novel lead indicator; no FRED series | Not tested |

## 4. Input Table Columns

File: `input_table.csv`
- Rows: monthly dates, 2001-01-01 to 2026-04-01
- 26 columns (all YoY % change from FRED)
- Target columns: CUSR0000SETC01 (CPI Auto Insurance), CUSR0000SEHG (CPI Tenants Insurance)
- Predictor columns: ['CUSR0000SETA01', 'CUSR0000SETA02', 'CUSR0000SETC', 'WPU141', 'WPU1412'] ... (all YoY %)
- **Residualization**: Applied per-analysis (subtract month-of-year mean before each correlation run)
