← Back to Home

Prediction Scorecard

How accurate Creek Intelligence's forecasts have been lately. Experimental — self-grading, updated daily.
Updated 2026-07-01 06:30 UTC · scoring window: last 45 days

Every forecast the system makes is recorded and later graded against what the river actually did — this page is that report card. It currently scores 272 physics crest predictions, 484 empirical likelihood forecasts, and 226 recession countdowns (647 still maturing). Updated daily; treat it as experimental.

⚠️ Needs attention

Physics Rise Predictors

Predicts the crest a gauge will reach from rainfall + antecedent moisture. Scored on how close the predicted peak was to the actual peak.

PredictorScoredAvg error (MAE)Bias (mean / median)Within ±20%
Cossatot River1590.53 ft0.33 / 0.24 ft76.0%
Richland Creek700.91 ft0.43 / 0.26 ft53.0%
Hailstone (upper Buffalo)43455.1 cfs-377.43 / 14.9 cfs35.0%

Bias is the mean / median signed error (predicted − actual). A large gap between them means a few outlier events — often a single flash-flood onset — are dragging the mean; the median is the typical miss.

Worst recent misses

Cossatot River

Richland Creek

Hailstone (upper Buffalo)

Empirical Forecast Engine

Predicts the likelihood of a rise to a given level from recent rainfall. Hit-rate = the river rose to at least the called level; no-rise-rate = the forecast rise never happened (lower is better).

BasinForecastsHit-rate
rose to ≥ called level
No-rise rate
rise never came
Outcome detail
Big Piney Creek4434.1%65.9%15 exact · 0 higher · 0 short · 29 none
Buffalo at Boxley10621.7%78.3%11 exact · 12 higher · 0 short · 83 none
Cossatot River10147.5%49.5%24 exact · 24 higher · 3 short · 50 none
Hailstone (upper Buffalo)6229.0%71.0%10 exact · 8 higher · 0 short · 44 none
Mulberry River9920.2%79.8%9 exact · 11 higher · 0 short · 79 none
Richland Creek7212.5%87.5%2 exact · 7 higher · 0 short · 63 none

By confidence label

ConfidencenHit-rateNo-rise rate
high13451.5%48.5%
medium13234.1%64.4%
low10019.0%80.0%

By rainfall percentile band (higher band = heavier rain vs. history)

BandnHit-rateNo-rise rate
p25_to_p5017935.2%64.8%
p50_to_p7511850.8%47.5%
above_p756914.5%84.1%

A well-calibrated engine would show hit-rate rising with the band. Where it does not, the engine is over-warning — a known selection-bias issue tracked for recalibration.

Recession Countdowns

Predicts how long until a falling river drops to each threshold. HIT% = reached the level near the predicted time; MAE = average timing error (hours). Newly launched — most predictions are still maturing.

GaugeTargetGradedHIT%Never reachedTiming MAEBias
big_piney/above_longpoollow_floatable4100.0%0.0%0.6h0.5h
big_piney/above_longpooltoo_low2348.0%52.0%24.0h-24.0h
big_piney/below_longpoollow_floatable967.0%33.0%83.3h-83.3h
big_piney/below_longpooltoo_low90.0%100.0%
buffalo/poncalow_floatable42100.0%0.0%13.0h-12.3h
cossatot/cossatotlow_floatable28100.0%0.0%3.9h-0.3h
cossatot/cossatottoo_low71100.0%0.0%6.2h-0.7h
mulberry/above_hwy_23low_floatable1173.0%27.0%90.0h-90.0h
mulberry/above_hwy_23too_low110.0%100.0%

11 additional gauge/target combinations have fewer than 4 graded outcomes (mostly still-censored or single-sample) and are hidden until they accumulate enough data to be meaningful. Bias is hours predicted − actual; negative means the countdown fires early.

Ponca AI Rainfall Event Analysis

When rain arms it, predicts how big the Ponca gauge will get — a class (Fizzle/Moderate/High/Flood), a typical-peak band, and a flood-risk %. Graded against Ponca's actual crest over the 36 h after each call.

Calls gradedEventsClass exactClass within 1Peak in IQRPeak error (MAE)
2311371%97%61%1149.0 cfs

Flood-risk calibration (Brier score, lower is better; 193 calls since the raw analog was also logged): override floor 0.163 vs. raw k-NN 0.092 — the raw analog is currently the better-calibrated of the two — the deterministic override floor stays high through the post-crest recession. Small sample, directional only.

Recent events — predicted vs. actual

Event (UTC)CallsPredicted classMax flood-riskActual crestActual class
2026-06-29T13:021Flood85%224 cfsModerate
2026-06-27T13:0252Moderate55%758 cfsModerate
2026-06-24T20:0250Moderate6%493 cfsModerate
2026-06-23T09:024Flood85%1020 cfsHigh
2026-06-23T08:024Flood85%1070 cfsHigh
2026-06-23T07:024Flood85%1110 cfsHigh

Buffalo Rise Engine (per-gauge nowcast)

For each Buffalo mainstem gauge, predicts a coming rise (slight / moderate / large) from local rain + upstream propagation, with a timing window. Graded on whether the gauge actually rose, and within the predicted window. Newly recording — predictions only fire during rain events, so this fills in over time.

GaugeGradedRise happenedOn-timeNo-rise (false alarm)
boxley00
ponca2236%75%14
pruitt3145%57%17
st_joe1464%11%5
harriet2025%40%15

Typical actual rise by predicted category: slight: ~483.5 cfs (n=12), moderate: ~410.0 cfs (n=24). (Categories come from rainfall, so this is how they map to real gauge rises — calibration that accrues over time.)

Recent rise predictions — predicted vs. actual

When (UTC)GaugePredictedWindowOutcomeActual rise
2026-06-28T21:09harrietmoderate5.0-10.0hrose+410.0 cfs @ 8.3h
2026-06-28T20:09harrietmoderate5.0-10.0hrose+400.0 cfs @ 9.3h
2026-06-28T19:09harrietmoderate5.0-10.0hrose+410.0 cfs @ 10.3h
2026-06-28T18:09harrietmoderate5.0-10.0hrose+400.0 cfs @ 11.3h
2026-06-28T17:09harrietmoderate5.0-10.0hrose+400.0 cfs @ 12.3h
2026-06-28T05:09st_joemoderate8.0-16.0hrose+410.0 cfs @ 15.6h

Downstream Propagation (Ponca → Pruitt → St. Joe)

When Ponca rises, predicts whether and how big a bump reaches Pruitt and St. Joe, and how many hours after Ponca peaks. Graded on the bump/no-bump call, the crest size, and the timing. Newly recording — only logs during a Ponca rise.

ReachGradedBump call rightCrest size hitTiming hit
pruitt333%0%0%
st_joe333%0%0%

Coverage

What the scorecard grades today, and what is still being wired into the loop:

Predictor familyStatusNotes
Physics rise predictorsgradedCossatot, Richland, Hailstone — predicted crest height/flow vs. the actual peak.
Empirical forecast enginegraded6 basins — 'likelihood of rise to tier X' vs. the tier the gauge actually reached.
Recession countdownsgraded9 gauges — maturing; the longest horizons settle ~7-10 days after they're issued.
Ponca AI Rainfall Event AnalysisgradedEach armed call graded vs Ponca's actual crest over the next 36 h (ponca_analog_eval.py).
Downstream propagation forecastgradedPonca -> Pruitt -> St. Joe; predictions now logged + graded on bump/magnitude/timing (propagation_eval.py).
Buffalo per-gauge rise predictionsgradedPer-gauge rise nowcasts recorded + graded vs the gauge's actual rise (buffalo_predictions_archive.py).
Buffalo flood_risk labels + propagation_alertsnot yet loggedLower-priority companions in buffalo_output; still ungraded.
Gauges Watersheds Cossatot Intelligence Richland Intelligence Mulberry Intelligence Big Piney Intelligence Illinois River Intel Hailstone Intelligence Buffalo Intelligence Buffalo Study
Guide Changelog Scorecard Page Suggestions and Corrections
Home
♥ Support this project
DISCLAIMER: This site provides creek condition estimates for informational purposes only. Gauge data, radar estimates, and forecasts may be delayed, inaccurate, or unavailable. Always exercise independent judgment. Whitewater kayaking is inherently dangerous — water conditions can change rapidly. This site and its maintainers assume no responsibility for decisions made based on information displayed here.