Model Methodology

How Flood Sentinel generates forecasts, and the limitations you should be aware of.

1. Overview

Flood Sentinel uses an ensemble machine learning approach to forecast river levels at configured gauge stations. It combines multiple model types to produce a consensus forecast with confidence intervals.

The system is designed as a decision-support tool for flood operations centres, council emergency managers, and infrastructure operators. It does not replace official Bureau of Meteorology flood warnings.

2. Ensemble Architecture

For each forecast station and time horizon, Flood Sentinel trains and maintains multiple models:

Model TypeLibraryStrengths
Gradient Boosted Trees XGBoost Handles non-linear relationships, robust to outliers
Gradient Boosted Trees LightGBM Fast training, handles large feature sets efficiently
Gradient Boosted Trees CatBoost Strong default hyperparameters, handles categorical features
Random Forest scikit-learn Ensemble diversity, natural uncertainty estimation
Ridge Regression scikit-learn Baseline model, fast, interpretable

The ensemble combines these models using weighted averaging, where weights are determined by each model's recent validation performance (lower RMSE = higher weight).

3. Input Features

Models are trained on the following feature categories:

  • Current conditions: Latest river level and rate-of-change at the forecast station and upstream gauges
  • Upstream propagation: Time-lagged readings from upstream stations (lag hours configured per catchment)
  • Rainfall: Observed and forecast rainfall at rain gauge stations within the catchment
  • Temporal features: Hour of day, day of year (seasonal patterns), day of week
  • Antecedent conditions: Rolling averages of upstream levels and rainfall over 6h, 12h, 24h, 48h, 72h windows
  • Tidal influence: Tidal predictions for tidally-affected stations
  • Rate features: Rate of rise/fall at upstream stations

4. Training Process

  • Data split: Chronological split (no random shuffling) to prevent data leakage. The most recent 20% of data is used for validation.
  • Horizon-specific models: Separate models are trained for each forecast horizon (e.g., 6h, 12h, 24h, 48h) because error characteristics change with lead time.
  • Cross-validation: Time-series aware cross-validation with expanding windows.
  • Retraining: Models can be retrained through the Training UI as new data accumulates. The system tracks model versions in its registry.
  • Quality gates: New models must meet minimum accuracy thresholds before replacing existing models in production.

5. Safety Clamps & Guardrails

To prevent physically impossible or dangerous forecasts, the system applies several safety mechanisms:

  • Physical bounds: Forecasts are clamped to station-specific min/max values (e.g., a river cannot have a negative level)
  • Rate-of-change limits: Forecasts that imply physically impossible rates of rise or fall are flagged and dampened
  • Confidence degradation: Confidence intervals widen with forecast horizon, reflecting increasing uncertainty
  • Fallback to persistence: If model confidence is very low, the system falls back to a persistence forecast (last observed level)
  • Ensemble disagreement: When individual models disagree significantly, the system flags this and widens confidence bounds

6. Confidence & Uncertainty

Forecast confidence is expressed as a percentage (0–100%) and as confidence bands on the hydrograph:

  • High confidence (80–100%): Models agree, recent validation accuracy is high, data is fresh
  • Moderate confidence (50–79%): Some model disagreement or data quality issues
  • Low confidence (<50%): Significant model disagreement, stale data, or conditions outside training range

Confidence is computed from: model agreement (ensemble spread), recent validation RMSE, data freshness, and whether current conditions are within the training distribution.

7. Limitations

Flood Sentinel forecasts are advisory only. Users should be aware of these limitations:

  • Training data range: Models perform best within the range of conditions seen during training. Extreme events beyond historical records may be poorly predicted.
  • Infrastructure changes: Dam operations, new levees, channel modifications, or urban development can change catchment behaviour. Models need retraining after significant infrastructure changes.
  • Rainfall forecast dependency: Longer-range forecasts depend on rainfall predictions, which have their own uncertainty. Forecast accuracy degrades beyond 24–48 hours.
  • Flash floods: Very rapid (sub-hourly) flood events in small catchments may develop faster than the system's data refresh cycle.
  • Tidal interactions: Complex tide-flood interactions at estuary stations are approximated and may not capture all dynamics.
  • Not a replacement for BoM warnings: The Bureau of Meteorology is the authoritative source for official flood warnings in Australia. Flood Sentinel should complement, not replace, official warnings.
  • Model degradation: All ML models can degrade over time as catchment conditions evolve. Regular retraining and performance monitoring are essential.

8. Validation & Performance Metrics

Model performance is continuously monitored using:

  • RMSE (Root Mean Square Error) — primary accuracy metric, in metres
  • MAE (Mean Absolute Error) — average forecast error magnitude
  • NSE (Nash-Sutcliffe Efficiency) — standard hydrological performance metric (1.0 = perfect, <0 = worse than mean)
  • Bias — systematic over/under-prediction tendency
  • Threshold hit rates — accuracy of predicting when levels cross alert/warning thresholds

Performance reports are accessible via the Model Health dashboard.

9. Model Versioning

Flood Sentinel maintains a model registry that tracks:

  • Training date and data range used
  • Validation metrics at time of training
  • Feature importance rankings
  • Model file checksums for integrity verification

Previous model versions are archived and can be rolled back if a newer model underperforms.