You tune against a model. Then reality disagrees. Closing that gap — and knowing where the model can't be trusted — is the job.
Comma.ai openpilot logs. This is what the car's own controllers saw: the same signals openpilot drives on. Production hardware, real roads.
data/raw/segments/
PLATFORM/
device/
route/
segment/
rlog.zst
A 3-ton truck and a compact hatch. The baseline model is mass-independent — it predicts identical cornering for both.
A rigid rod of wheelbase L, no tyre, no slip — the car goes exactly where the wheels point. Yaw rate falls straight out of geometry.
Measured v and δ are clamped at every integration step, so the longitudinal channel is an input, not a prediction. What's left is purely lateral — and the model's lies are all lateral, so this isolates exactly the residual we want to measure.
The truth columns exist in sim.csv — but not in what your model is given at inference.
Not redundant: a tiny persistent yaw bias is nearly invisible per-sample but compounds into hundreds of metres of drift.
V0, scored on 534 held-out segments. Everything from here is measured against them.
The 254 m is the compounding-bias problem made concrete — integrate a slightly biased yaw rate over a minute of driving and the trajectory drifts off the map.