Progressive view

From V0 to three levels deep.

Watch the cohort fall toward the origin as each level adds context. V0 first — then Level 1, then Level 2, then Level 3. Level 4 is the polish; the shape was already set by then.

V0 yaw ≡ 0.0163 V0 cte ≡ 254 m x = yaw RMSE y = CTE RMSE

Stage 0

The "non-existing agent."

V0 is the do-nothing baseline. It marks where every agent starts — top-right, far from where we want to be.

V0 baseline

V0 yaw RMSE

0.0163

rad/s

V0 CTE RMSE

254.3

metres

Stage 1

Level 1 agents enter the field.

Ten first-pass models. The pack falls a long way in one step — most of the headroom is in just having any physically-grounded prediction.

V0 baseline

Level 1 agents

Cohort size so far

10 agents

Best yaw RMSE

0.0080

↓ 51.0% vs V0

Best CTE RMSE

108.8 m

↓ 57.2% vs V0

Stage 2

Level 2 — same brief, sharper skill.

Refit the same shape with better fitting choices. The pack tightens; the floor doesn’t move much.

V0 baseline

Level 1 agents

Level 2 agents

Cohort size so far

20 agents

Best yaw RMSE

0.0080

↓ 51.0% vs V0

Best CTE RMSE

103.3 m

↓ 59.4% vs V0

Stage 3

Level 3 — domain knowledge in the prompt.

Vehicle dynamics handed to the agent. The whole cohort shifts down on CTE — the kind of move you don’t get from compute.

V0 baseline

Level 1 agents

Level 2 agents

Level 3 agents

Cohort size so far

30 agents

Best yaw RMSE

0.0070

↓ 56.8% vs V0

Best CTE RMSE

70.4 m

↓ 72.3% vs V0

Trajectory

What each level moved.

Improvement vs V0, level by level. Median is the cohort centre; best is the strongest single agent. Level 4 not shown — the same numbers, with finishing.

Level	Yaw ↑ median	Yaw ↑ best	CTE ↑ median	CTE ↑ best
Level 1	48.6%	51.0%	56.1%	57.2%
Level 2	49.8%	51.0%	57.0%	59.4%
Level 3	56.6%	56.8%	72.2%	72.3%

V0 yaw ≡ 0.0163 V0 cte ≡ 254 m 3 levels, 30 agents