Model Methodology - Flood Sentinel

1. Overview

Flood Sentinel uses an ensemble machine learning approach to forecast river levels at configured gauge stations. It combines multiple model types to produce a consensus forecast with confidence intervals.

The system is designed as a decision-support tool for flood operations centres, council emergency managers, and infrastructure operators. It does not replace official Bureau of Meteorology flood warnings.

2. Ensemble Architecture

For each forecast station and time horizon, Flood Sentinel trains and maintains multiple models:

Model Type	Library	Strengths
Gradient Boosted Trees	XGBoost	Handles non-linear relationships, robust to outliers
Gradient Boosted Trees	LightGBM	Fast training, handles large feature sets efficiently
Gradient Boosted Trees	CatBoost	Strong default hyperparameters, handles categorical features
Random Forest	scikit-learn	Ensemble diversity, natural uncertainty estimation
Ridge Regression	scikit-learn	Baseline model, fast, interpretable

The ensemble combines these models using weighted averaging, where weights are determined by each model's recent validation performance (lower RMSE = higher weight).

3. Input Features

Models are trained on the following feature categories:

Current conditions: Latest river level and rate-of-change at the forecast station and upstream gauges
Upstream propagation: Time-lagged readings from upstream stations (lag hours configured per catchment)
Rainfall: Observed and forecast rainfall at rain gauge stations within the catchment
Temporal features: Hour of day, day of year (seasonal patterns), day of week
Antecedent conditions: Rolling averages of upstream levels and rainfall over 6h, 12h, 24h, 48h, 72h windows
Tidal influence: Tidal predictions for tidally-affected stations
Rate features: Rate of rise/fall at upstream stations

4. Training Process

Data split: Chronological split (no random shuffling) to prevent data leakage. The most recent 20% of data is used for validation.
Horizon-specific models: Separate models are trained for each forecast horizon (e.g., 6h, 12h, 24h, 48h) because error characteristics change with lead time.
Cross-validation: Time-series aware cross-validation with expanding windows.
Retraining: Models can be retrained through the Training UI as new data accumulates. The system tracks model versions in its registry.
Quality gates: New models must meet minimum accuracy thresholds before replacing existing models in production.

5. Safety Clamps & Guardrails

To prevent physically impossible or dangerous forecasts, the system applies several safety mechanisms:

Physical bounds: Forecasts are clamped to station-specific min/max values (e.g., a river cannot have a negative level)
Rate-of-change limits: Forecasts that imply physically impossible rates of rise or fall are flagged and dampened
Confidence degradation: Confidence intervals widen with forecast horizon, reflecting increasing uncertainty
Fallback to persistence: If model confidence is very low, the system falls back to a persistence forecast (last observed level)
Ensemble disagreement: When individual models disagree significantly, the system flags this and widens confidence bounds

6. Confidence & Uncertainty

Forecast confidence is expressed as a percentage (0–100%) and as confidence bands on the hydrograph:

High confidence (80–100%): Models agree, recent validation accuracy is high, data is fresh
Moderate confidence (50–79%): Some model disagreement or data quality issues
Low confidence (<50%): Significant model disagreement, stale data, or conditions outside training range

Confidence is computed from: model agreement (ensemble spread), recent validation RMSE, data freshness, and whether current conditions are within the training distribution.

7. Limitations

Flood Sentinel forecasts are advisory only. Users should be aware of these limitations:

Training data range: Models perform best within the range of conditions seen during training. Extreme events beyond historical records may be poorly predicted.
Infrastructure changes: Dam operations, new levees, channel modifications, or urban development can change catchment behaviour. Models need retraining after significant infrastructure changes.
Rainfall forecast dependency: Longer-range forecasts depend on rainfall predictions, which have their own uncertainty. Forecast accuracy degrades beyond 24–48 hours.
Flash floods: Very rapid (sub-hourly) flood events in small catchments may develop faster than the system's data refresh cycle.
Tidal interactions: Complex tide-flood interactions at estuary stations are approximated and may not capture all dynamics.
Not a replacement for BoM warnings: The Bureau of Meteorology is the authoritative source for official flood warnings in Australia. Flood Sentinel should complement, not replace, official warnings.
Model degradation: All ML models can degrade over time as catchment conditions evolve. Regular retraining and performance monitoring are essential.

8. Validation & Performance Metrics

Model performance is continuously monitored using:

RMSE (Root Mean Square Error) — primary accuracy metric, in metres
MAE (Mean Absolute Error) — average forecast error magnitude
NSE (Nash-Sutcliffe Efficiency) — standard hydrological performance metric (1.0 = perfect, <0 = worse than mean)
Bias — systematic over/under-prediction tendency
Threshold hit rates — accuracy of predicting when levels cross alert/warning thresholds

Performance reports are accessible via the Model Health dashboard.

9. Model Versioning

Flood Sentinel maintains a model registry that tracks:

Training date and data range used
Validation metrics at time of training
Feature importance rankings
Model file checksums for integrity verification

Previous model versions are archived and can be rolled back if a newer model underperforms.