Prediction Scorecard

How accurate Creek Intelligence's forecasts have been lately. Experimental — self-grading, updated daily.
Updated 2026-07-01 06:30 UTC · scoring window: last 45 days

Every forecast the system makes is recorded and later graded against what the river actually did — this page is that report card. It currently scores 272 physics crest predictions, 484 empirical likelihood forecasts, and 226 recession countdowns (647 still maturing). Updated daily; treat it as experimental.

⚠️ Needs attention

Empirical engine over-warns (Buffalo at Boxley 78.3%, Hailstone (upper Buffalo) 71.0%, Mulberry River 79.8%, Richland Creek 87.5% of 'rise likely' calls saw no rise). A lookup_tables.yaml rebuild (ARCH §16-11) is the fix.
Empirical band inversion: the heaviest-rain band is LESS reliable (84.1% no-rise) than the band below it (47.5%) — the rain-to-rise mapping is miscalibrated.
Physics crest predictors off-calibration: Hailstone (upper Buffalo) 35.0% within ±20% (bias -377.43 cfs, but median err just 14.9 cfs — mean skewed by a few outlier events).
Ponca override flood-floor is less calibrated than the raw analog (Brier 0.163 vs 0.092, n=193); the floor stays high into recession — consider easing it post-peak.
Recession big_piney/above_longpool 'too_low' timing bias: fires ~24 h early (river falls slower than predicted); MAE 24.0 h, n=23.
Recession big_piney/below_longpool 'low_floatable' timing bias: fires ~83 h early (river falls slower than predicted); MAE 83.3 h, n=9.
Recession big_piney/below_longpool 'too_low' rarely reached (100.0% never-reached, n=9); a recession_baseline may help.
Recession buffalo/ponca 'low_floatable' timing bias: fires ~12 h early (river falls slower than predicted); MAE 13.0 h, n=42.
Recession mulberry/above_hwy_23 'low_floatable' timing bias: fires ~90 h early (river falls slower than predicted); MAE 90.0 h, n=11.
Recession mulberry/above_hwy_23 'too_low' rarely reached (100.0% never-reached, n=11); a recession_baseline may help.

Physics Rise Predictors

Predicts the crest a gauge will reach from rainfall + antecedent moisture. Scored on how close the predicted peak was to the actual peak.

Predictor	Scored	Avg error (MAE)	Bias (mean / median)	Within ±20%
Cossatot River	159	0.53 ft	0.33 / 0.24 ft	76.0%
Richland Creek	70	0.91 ft	0.43 / 0.26 ft	53.0%
Hailstone (upper Buffalo)	43	455.1 cfs	-377.43 / 14.9 cfs	35.0%

Bias is the mean / median signed error (predicted − actual). A large gap between them means a few outlier events — often a single flash-flood onset — are dragging the mean; the median is the typical miss.

Worst recent misses

Cossatot River

2026-06-19T18:20: predicted 6.48, actual 2.91 (err +3.57 ft)
2026-06-07T05:20: predicted 3.52, actual 6.04 (err -2.52 ft)
2026-06-22T12:20: predicted 5.41, actual 3.12 (err +2.29 ft)

Richland Creek

2026-06-14T03:20: predicted 17.39, actual 9.42 (err +7.97 ft)
2026-06-14T04:20: predicted 16.05, actual 9.42 (err +6.63 ft)
2026-06-13T23:20: predicted 2.7, actual 7.21 (err -4.51 ft)

Hailstone (upper Buffalo)

2026-06-22T10:21: predicted 244.0, actual 6420.0 (err -6176.0 cfs)
2026-06-22T11:21: predicted 842.1, actual 6420.0 (err -5577.9 cfs)
2026-06-22T12:21: predicted 4744.3, actual 7300.0 (err -2555.7 cfs)

Empirical Forecast Engine

Predicts the likelihood of a rise to a given level from recent rainfall. Hit-rate = the river rose to at least the called level; no-rise-rate = the forecast rise never happened (lower is better).

Basin	Forecasts	Hit-rate rose to ≥ called level	No-rise rate rise never came	Outcome detail
Big Piney Creek	44	34.1%	65.9%	15 exact · 0 higher · 0 short · 29 none
Buffalo at Boxley	106	21.7%	78.3%	11 exact · 12 higher · 0 short · 83 none
Cossatot River	101	47.5%	49.5%	24 exact · 24 higher · 3 short · 50 none
Hailstone (upper Buffalo)	62	29.0%	71.0%	10 exact · 8 higher · 0 short · 44 none
Mulberry River	99	20.2%	79.8%	9 exact · 11 higher · 0 short · 79 none
Richland Creek	72	12.5%	87.5%	2 exact · 7 higher · 0 short · 63 none

By confidence label

Confidence	n	Hit-rate	No-rise rate
high	134	51.5%	48.5%
medium	132	34.1%	64.4%
low	100	19.0%	80.0%

By rainfall percentile band (higher band = heavier rain vs. history)

Band	n	Hit-rate	No-rise rate
p25_to_p50	179	35.2%	64.8%
p50_to_p75	118	50.8%	47.5%
above_p75	69	14.5%	84.1%

A well-calibrated engine would show hit-rate rising with the band. Where it does not, the engine is over-warning — a known selection-bias issue tracked for recalibration.

Recession Countdowns

Predicts how long until a falling river drops to each threshold. HIT% = reached the level near the predicted time; MAE = average timing error (hours). Newly launched — most predictions are still maturing.

Gauge	Target	Graded	HIT%	Never reached	Timing MAE	Bias
big_piney/above_longpool	low_floatable	4	100.0%	0.0%	0.6h	0.5h
big_piney/above_longpool	too_low	23	48.0%	52.0%	24.0h	-24.0h
big_piney/below_longpool	low_floatable	9	67.0%	33.0%	83.3h	-83.3h
big_piney/below_longpool	too_low	9	0.0%	100.0%	—	—
buffalo/ponca	low_floatable	42	100.0%	0.0%	13.0h	-12.3h
cossatot/cossatot	low_floatable	28	100.0%	0.0%	3.9h	-0.3h
cossatot/cossatot	too_low	71	100.0%	0.0%	6.2h	-0.7h
mulberry/above_hwy_23	low_floatable	11	73.0%	27.0%	90.0h	-90.0h
mulberry/above_hwy_23	too_low	11	0.0%	100.0%	—	—

11 additional gauge/target combinations have fewer than 4 graded outcomes (mostly still-censored or single-sample) and are hidden until they accumulate enough data to be meaningful. Bias is hours predicted − actual; negative means the countdown fires early.

Ponca AI Rainfall Event Analysis

When rain arms it, predicts how big the Ponca gauge will get — a class (Fizzle/Moderate/High/Flood), a typical-peak band, and a flood-risk %. Graded against Ponca's actual crest over the 36 h after each call.

Calls graded	Events	Class exact	Class within 1	Peak in IQR	Peak error (MAE)
231	13	71%	97%	61%	1149.0 cfs

Flood-risk calibration (Brier score, lower is better; 193 calls since the raw analog was also logged): override floor 0.163 vs. raw k-NN 0.092 — the raw analog is currently the better-calibrated of the two — the deterministic override floor stays high through the post-crest recession. Small sample, directional only.

Recent events — predicted vs. actual

Event (UTC)	Calls	Predicted class	Max flood-risk	Actual crest	Actual class
2026-06-29T13:02	1	Flood	85%	224 cfs	Moderate
2026-06-27T13:02	52	Moderate	55%	758 cfs	Moderate
2026-06-24T20:02	50	Moderate	6%	493 cfs	Moderate
2026-06-23T09:02	4	Flood	85%	1020 cfs	High
2026-06-23T08:02	4	Flood	85%	1070 cfs	High
2026-06-23T07:02	4	Flood	85%	1110 cfs	High

Buffalo Rise Engine (per-gauge nowcast)

For each Buffalo mainstem gauge, predicts a coming rise (slight / moderate / large) from local rain + upstream propagation, with a timing window. Graded on whether the gauge actually rose, and within the predicted window. Newly recording — predictions only fire during rain events, so this fills in over time.

Gauge	Graded	Rise happened	On-time	No-rise (false alarm)
boxley	0	—	—	0
ponca	22	36%	75%	14
pruitt	31	45%	57%	17
st_joe	14	64%	11%	5
harriet	20	25%	40%	15

Typical actual rise by predicted category: slight: ~483.5 cfs (n=12), moderate: ~410.0 cfs (n=24). (Categories come from rainfall, so this is how they map to real gauge rises — calibration that accrues over time.)

Recent rise predictions — predicted vs. actual

When (UTC)	Gauge	Predicted	Window	Outcome	Actual rise
2026-06-28T21:09	harriet	moderate	5.0-10.0h	rose	+410.0 cfs @ 8.3h
2026-06-28T20:09	harriet	moderate	5.0-10.0h	rose	+400.0 cfs @ 9.3h
2026-06-28T19:09	harriet	moderate	5.0-10.0h	rose	+410.0 cfs @ 10.3h
2026-06-28T18:09	harriet	moderate	5.0-10.0h	rose	+400.0 cfs @ 11.3h
2026-06-28T17:09	harriet	moderate	5.0-10.0h	rose	+400.0 cfs @ 12.3h
2026-06-28T05:09	st_joe	moderate	8.0-16.0h	rose	+410.0 cfs @ 15.6h

Downstream Propagation (Ponca → Pruitt → St. Joe)

When Ponca rises, predicts whether and how big a bump reaches Pruitt and St. Joe, and how many hours after Ponca peaks. Graded on the bump/no-bump call, the crest size, and the timing. Newly recording — only logs during a Ponca rise.

Reach	Graded	Bump call right	Crest size hit	Timing hit
pruitt	3	33%	0%	0%
st_joe	3	33%	0%	0%

Coverage

What the scorecard grades today, and what is still being wired into the loop:

Predictor family	Status	Notes
Physics rise predictors	graded	Cossatot, Richland, Hailstone — predicted crest height/flow vs. the actual peak.
Empirical forecast engine	graded	6 basins — 'likelihood of rise to tier X' vs. the tier the gauge actually reached.
Recession countdowns	graded	9 gauges — maturing; the longest horizons settle ~7-10 days after they're issued.
Ponca AI Rainfall Event Analysis	graded	Each armed call graded vs Ponca's actual crest over the next 36 h (ponca_analog_eval.py).
Downstream propagation forecast	graded	Ponca -> Pruitt -> St. Joe; predictions now logged + graded on bump/magnitude/timing (propagation_eval.py).
Buffalo per-gauge rise predictions	graded	Per-gauge rise nowcasts recorded + graded vs the gauge's actual rise (buffalo_predictions_archive.py).
Buffalo flood_risk labels + propagation_alerts	not yet logged	Lower-priority companions in buffalo_output; still ungraded.

Gauges Watersheds Cossatot Intelligence Richland Intelligence Mulberry Intelligence Big Piney Intelligence Illinois River Intel Hailstone Intelligence Buffalo Intelligence Buffalo Study

Guide Changelog Scorecard Page Suggestions and Corrections

Home

♥ Support this project

DISCLAIMER: This site provides creek condition estimates for informational purposes only. Gauge data, radar estimates, and forecasts may be delayed, inaccurate, or unavailable. Always exercise independent judgment. Whitewater kayaking is inherently dangerous — water conditions can change rapidly. This site and its maintainers assume no responsibility for decisions made based on information displayed here.