Methodology

How we model the data.

The project unifies 59 public datasets at the US county level. Below is what's actually modeled per category, what fallbacks we use, and why those fallbacks are honest defaults. The short version: every score declares its mode, every fallback is tagged explicitly, and no score is silently filled in.

Contents

Taxes
Housing
Schools
Air quality
Crime
Climate & hazards
Descriptive context

Taxes

Local income and sales tax — per-county where we can, state averages otherwise.

Federal income tax, payroll/self-employment tax, state income tax, and state sales tax are always modeled per-county under the active scenario. The interesting fallback discussion is about local income and sales taxes — those sit on top of the state layer and vary inside a single state.

What's modeled per-county today:

Local income tax: MD, NY (NYC boroughs + Yonkers-share Westchester proxy), PA (population-weighted county EIT including Philadelphia 3.75%), IN (all 92 counties at DOR Departmental Notice #1), OH (all 88 via largest-city RITA/CCA/The Finder proxy), MI (16 of 83 via largest-income-tax-city proxy), KY (110 occupational-rate counties + 10 no-local counties).
Local sales tax: CA, FL, NY, IL, AZ, AL, OK, TX, WA — per-county at county_official tier from the state DOR's machine-readable file.

Why the rest falls back to state averages. US local taxes are heterogeneous in two structural ways: tax rights vary by state (local income tax is permitted in only ~16 states; among those, our coverage is partial because the rest have no income-tax-imposing city to attribute), and publishing format is uneven (some states publish state tables, others publish PDFs, others have no public registry and require crawling each city's official rate page).

To connect a new state per-county we have to: find the official registry, parse it into a stable format, map their geography (city codes, district IDs) to FIPS county codes, decide how to aggregate overlapping rates (county + city + transit + SPD) into one county-typical rate, fix vintage, and expose explicit caveat tags (home-rule cities, OH JEDD/JEDZ, MI non-resident burden, KY occupational license rates). That's nontrivial per state, which is why we wire them in one at a time.

For the rest, the fallback is the state-published average. It's not invented and it's not zero — it's the official state-level weighted average where local taxes exist. The county-level value carries an explicit state_avg_official or not_yet_modeled tier tag, visible in the Money tab as a colored badge. The composite tax-affordability score knows the mode and doesn't compare apples to oranges silently. As we wire new states, the tier tag flips from state_avg_official to county_official with no quiet behavior change.

Housing

Four separate layers, never blended into one price.

Four separate housing layers are kept as independent descriptive signals, not blended into a single price index:

ACS county medians (Census ACS 5-year): rent from B25064, home value from B25077, monthly owner cost from B25088. Canonical comparable medians across all 3,222 county-equivalents.
HUD FMR / SAFMR: HUD's official rent benchmarks for Fair Market Rent purposes plus Small Area FMR for metro areas.
FHFA county HPI: official annual price-index trend per county — answers "is this market heating, cooling, or flat?"
Zillow Research ZORI / ZHVI: privately aggregated rent and value indices for counties with sufficient listing volume.

Why we don't blend. Each has a different denominator, vintage, and coverage policy. Mixing them into a single "price" would hide the underlying assumption — and the assumptions differ in ways users care about (ACS is renter-base self-report; Zillow is asking-rent on active listings; HUD is policy benchmark). Showing them side-by-side lets you triangulate.

Additional housing-context layers in the County Detail panel: bedroom mix and structure type from ACS B25032/B25024, plus PUMS housing context (cost-burden share at 30%/50%, overcrowded share, median rooms, median year built) with proportional allocation when a PUMA spans multiple counties. The PUMS quality flag is exposed.

Schools

CCD + EDFacts + CRDC feed the canonical score; SEDA is descriptive.

The canonical county school score is built from three NCES / EDFacts / CRDC layers:

NCES CCD (SY 2024-25): school and district enrollment, grades served, geocoded to county via the official NCES EDGE geocodes.
EDFacts FS195 chronic absenteeism (SY 2022-23, EDE release 2024-11): school-level chronic-absence rate. 91,479 of 99,259 SY 2024-25 schools carry a value; the rest are not in the SY 2022-23 EDFacts file (open/close drift, low-N suppression, or missing CCD denominator).
CRDC advanced-course access (2021-22): % of students with access to AP, calculus, advanced math, etc.

These three drive the canonical county_school_score. Source modes are explicit at county level: CRDC-only, CRDC+EDFacts chronic absenteeism, or mixed when only some schools in the county have EDFacts matches.

Cross-state academic context is intentionally descriptive only and never feeds the canonical score: SEDA 5.0 (Stanford, pre-pandemic SY 2008-09 to 2018-19, NAEP-anchored math + ELA), ERS / SEDA 2024.3 (post-pandemic recovery through SY 2024), and per-state 2024 assessments (CA / TX / FL / NY / PA / IL / OH / GA / NC wired with parsers; MI wired for a manually staged MI School Data export and emits rows only after that export is present). Per-state assessments carry cross_state_comparable=false and are for within-state context only.

Why descriptive only: academic outcomes are politically contested, sample size varies by district, and we don't want a single ranking number to imply "school X is N% better than school Y." The numbers are exposed in the Schools tab and Compare view with explicit vintage and source mode so readers can interpret without us doing it for them.

Air quality

County monitor where it exists, explicit state fallback otherwise.

EPA AirData publishes annual concentration files only for counties that operate at least one official monitor. Coverage:

Strict mode (strict_avg_aqi, 958 counties): the county's own AirData monitor average. Used as the primary avg_aqi value.
State-fallback mode (state_fallback, 2,255 counties): population-weighted mean avg_aqi of monitored counties in the same state, published as avg_aqi_state_fallback. Marked approximate_state_mean.
No-data mode (no_data, 9 counties): no own monitor and no state to fall back to (mostly territory edges).

The strict avg_aqi column is unchanged for fallback rows — it stays NULL, so strict-only consumers don't accidentally widen their coverage. The air_quality_score_mode field declares the mode explicitly, and liveability_score_with_air_fallback is a sibling of liveability_score_full, not a silent replacement — users opt in.

Descriptive ozone-season summary (EPA AirData ozone 8-hour 2015 standard) and SDWA drinking-water enforcement-action counts are exposed in the Context tab, also descriptive only.

Crime

FBI CDE only. State-aggregate fallback with caution tier.

FBI Crime Data Explorer (CDE) is the only official source we use. We do not use commercial crime benchmark overlays.

County mapping logic:

Direct attribution: most agencies (county sheriff offices, county-level departments) map cleanly to a single county FIPS code via the agency-to-ORI metadata.
State-aggregate fallback (usable_with_caution tier): when an agency's offense counts can't be cleanly attributed to a single county (state-police agencies, multi-county task forces), the state-level total is applied as a caution-tier overlay rather than silently dropping the county from the rankings.
Multi-county agency splits (population_weighted_proxy): 638 agencies whose counties metadata lists multiple counties get their annual offense totals allocated via ACS population weights. National pass: 745 counties moved from _pending to _resolved (576 within strong, 169 within usable_with_caution).

Quality flag tiers are exposed in the Spatial tab: strong (clean single-county attribution), usable_with_caution_population_weighted_split_resolved, or usable_with_caution_state_aggregate_fallback.

FBI CDE's /summarized/query endpoint does not support per-county-of-incident filtering scoped to a single ORI, so population_weighted_proxy is the best available proxy for multi-county agencies. Crime fetch reliability is also explicit: batched FBI CDE summary calls are retried as single-query requests on intermittent 503s, and unresolved queries are persisted so transient failures stay distinct from genuine missing counts.

Climate & hazards

Observed normals, trends, and official projection context.

Four layers feed canonical and descriptive views:

NOAA NCEI county weather normals (1991-2020 base period): temperature, precipitation, GDD per county — feeds weather_comfort_score.
NOAA nClimDiv observed trends (1990-present): per-decade temperature and precipitation trend per county. Spot-checks: Maricopa AZ +0.61 °F/decade, King WA +0.29 °F/decade, Miami-Dade FL +0.57 °F/decade. Descriptive only — not a projection.
NOAA Climate Explorer / ACIS LOCA hot-day projections: annual days above 95°F for 2035 and 2050 under RCP4.5/RCP8.5. Exposed as descriptive Context, Compare, filters, sort axes and map metrics; not blended into canonical weather comfort.
FEMA NRI: composite natural-hazard risk index per county with sub-hazards (wildfire, hurricane, riverine flooding, coastal flooding, earthquake, drought, etc.) decomposable in the Spatial tab.

We do not fabricate synthetic future-climate projections. The forward-looking hot-day layer is an official Climate Explorer / ACIS projection context with explicit scenario caveats; observed NOAA NCEI trend files remain separate and are refreshed by procdate.

Descriptive context

Demographic, health, amenity, and experimental layers — never feed canonical scores.

Strictly descriptive (no scoring weight):

ACS demographics (Census ACS 5-year): age, race/ethnicity, ancestry, language, education attainment. The project does not treat any race, ethnicity, or national-origin composition as intrinsically positive or negative. diversity_score is a descriptive heterogeneity indicator only and is kept separate from the normative liveability composite.
CBP NAICS amenities (County Business Patterns 2023): establishment counts per 100k for grocery, pharmacy, childcare, coworking (NAICS 531120).
CDC PLACES: 36+ small-area health indicators (smoking, obesity, diabetes, mental-health distress, asthma, etc.).
FCC broadband (2024-12-01): 100/20 Mbps fixed broadband share per county.

Experimental layers — also descriptive only: PUMS housing (cost-burden, overcrowding, median year built), QCEW wages (BLS, avg weekly wage by NAICS supersector), LAUS labor (BLS, civilian labor force / unemployment rate), NDCP childcare (weekly cost estimates by care type). These never feed canonical liveability / school / spatial scores.

Why they never feed canonical scores: vintages are misaligned with the canonical layer, methodologies are non-parallel (PUMS proportional allocation vs ACS county direct), and several layers (SEDA / ERS / per-state assessments) are politically loaded enough that we want users to read them explicitly with vintage stated, not as a hidden ingredient.

Honest defaults

How to read missing or approximate data.

No hidden zeroes. If we do not have a county-level number, we say so instead of pretending the value is zero.
State averages are labeled. When a state-published average is the best available public number, the UI marks it as a state estimate, not as a local county measurement.
Scores show their data quality. The Money, Schools, Spatial, and Context views show whether a value is direct, approximate, or not modeled yet.
New coverage is visible. When a state or dataset gets better county-level coverage, the badge on the affected counties changes too.

If a number looks surprising, do not just trust the rank. Open the detail tab, check the label, date, and source, then decide whether that assumption fits your household scenario.

See the full source registry under /sources for plain-language caveats per dataset.

Explore counties →