You tune against a model. Then reality disagrees. Closing that gap — and knowing where the model can't be trusted — is the job.
Comma.ai openpilot logs. This is what the car's own controllers saw: the same signals openpilot drives on. Production hardware, real roads.
A 3-ton truck and a compact hatch. The baseline model is mass-independent — it predicts identical cornering for both.
A rigid rod of wheelbase L, no tyre, no slip — the car goes exactly where the wheels point. Yaw rate falls straight out of geometry.
Measured v and δ are clamped at every integration step, so the longitudinal channel is an input, not a prediction. What's left is purely lateral — and the model's lies are all lateral, so this isolates exactly the residual we want to measure.
| delta_road_rad | v_mps | a_long_mps2 | accel_pedal_pct | yaw_rate_meas | a_lat_meas | yaw_rate_pred | a_y_pred | x_m | y_m | psi_rad | yaw_resid | a_y_resid |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| +0.0184 | 21.74 | +0.31 | 17.4 | +0.1372 | +2.984 | +0.1391 | +3.023 | 109.4 | 2.18 | +0.072 | +0.0019 | +0.039 |
| +0.0186 | 21.78 | +0.28 | 17.8 | +0.1389 | +3.022 | +0.1408 | +3.067 | 109.8 | 2.20 | +0.075 | +0.0019 | +0.045 |
| +0.0191 | 21.81 | +0.22 | 17.2 | +0.1402 | +3.058 | +0.1444 | +3.149 | 110.3 | 2.22 | +0.078 | +0.0042 | +0.091 |
| +0.0198 | 21.83 | +0.18 | 16.9 | +0.1418 | +3.094 | +0.1497 | +3.267 | 110.7 | 2.25 | +0.081 | +0.0079 | +0.173 |
| +0.0204 | 21.84 | +0.11 | 16.4 | +0.1431 | +3.124 | +0.1543 | +3.369 | 111.2 | 2.27 | +0.085 | +0.0112 | +0.245 |
| +0.0212 | 21.83 | +0.04 | 15.9 | +0.1444 | +3.148 | +0.1602 | +3.498 | 111.6 | 2.30 | +0.088 | +0.0158 | +0.350 |
The truth columns exist in sim.csv — but not in what your model is given at inference.
Instantaneous fidelity — how close the predicted yaw rate is to measured, every sample.
Where the integrated trajectory actually ends up, resampled at uniform distance.
Not redundant: a tiny persistent yaw bias is nearly invisible per-sample but compounds into hundreds of metres of drift.
V0, scored on 534 held-out segments. Everything from here is measured against them.
The 254 m is the compounding-bias problem made concrete — integrate a slightly biased yaw rate over a minute of driving and the trajectory drifts off the map.