Level 03

Domain Knowledge.

A working definition · Further reading · The reference shelf, built up · How to write one

Definition · 02

Domain Knowledge

noun /dəˈmeɪn ˈnɒl.ɪdʒ/

A short markdown document the agent loads on demand. It carries the judgement the codebase can't — the traps prior work hit, the levers worth pulling, the why behind the rule, with a worked example. Frontmatter tells the agent when to load it; the body only enters context when the moment matches. ⚙Every new failure your team sees becomes a new line.

01 judgement 02 frontmatter 03 body on demand 04 ratchet

Mitchell Hashimoto — My AI Adoption Journey, Feb 2026 (the ratchet method);
Sean Grove — The New Code, AI Engineer World's Fair 2025 (spec as lossless source of intent);
BettaTech — ¿Qué es esto del Harness Engineering?, 2026 (guides vs sensors).

Diagram · 03 · The reference shelf, built up 01 / 06

Reference · Further reading 01 / 01

Essay · February 2026

My AI Adoption Journey

Mitchell Hashimoto — HashiCorp co-founder, mitchellh.com.
Synthesised into the harness engineering vocabulary by BettaTech (Spanish-language YouTube, late April 2026).

The ratchet method — every agent mistake gets engineered out structurally: in AGENTS.md, in a reference doc, in a tool wrapper. Never just re-prompted away. Failures are inputs to the next iteration.
"AGENTS.md is the changelog of every mistake your agents have made, written in the imperative." The same loop applies to every markdown artifact in the harness — including the domain-knowledge references.

Read the essay

Anatomy · How to write a reference 01 / 04

anti-patterns.md

---
name: anti-patterns
description: Common ways prior work on this
  task has gone wrong. Lead with these —
  most of them are not obvious from the
  data alone.
when-to-load: Before you settle on a fitting
  procedure or evaluation slice.
load-cost: ~600 words.
---



The legal cousin — per-segment δ₀ from
input channels (this is THE winning move
on the right platforms)

This is the single highest-leverage move on
this dataset. In the most recent m3 cohort,
the three top-tier agents all shipped it;
the three bottom-tier agents all didn't
— and the gap was +8 pts yaw /
+15 pts CTE between tiers, with
model form otherwise identical.

Panel A · anti-patterns.md

The description carries judgement, not mechanics.

The frontmatter names the role of the doc, not its contents — "common ways prior work has gone wrong." The body opens with a worked example whose first paragraph is cohort evidence: not a principle, an outcome with numbers attached.

Grove's framing applies here: the reference is the lossless source of judgement. The code that implements δ₀ correction is a lossy projection of the insight that bottom-tier agents reliably miss it. The reference carries the insight; the code carries the implementation.

exploration-discipline.md

---
name: exploration-discipline
description: Protocol for naming ≥5
  alternatives (at least 3 different model
  structures) before committing to one,
  plus the EXPERIMENTS.md log convention.
  Prevents silent re-convergence on the
  same approach prior cohorts piled up on.
when-to-load: At the start of a fresh task,
  before your first fit. Re-read whenever
  you're tempted to "just iterate on the
  current model".
---



Every EXPERIMENTS.md entry MUST carry a
Rung: 0|1|2|3|orthogonal tag.

The pre-flighting-final-model skill
enforces at least one Rung: 1+ or
Rung: orthogonal entry before the bundle
can ship.

Panel B · exploration-discipline.md

The reference as protocol.

A reference doesn't have to teach — it can prescribe. This one is a procedure: name five alternatives, log them, tag the rung, and the harness will refuse to ship if you skipped the climb.

That last sentence is the ratchet in action — a prior cohort failed (every agent piled up on rung-0 refinements), so the harness was modified to prevent that failure from recurring. The reference doc is the human-readable face of the same change. References and skills co-evolve with the failures they exist to prevent.

dynamics-formulations.md

---
name: dynamics-formulations
description: V0 documented in full plus
  sketches of higher-rung formulations
  (linear dynamic ST with slip angles,
  nonlinear tyre, multi-body).
when-to-load: When choosing a model
  structure, or when residual-structure
  flags `structure_detected`.

Living doc — append your formulation
here when you ship one past V0.
---



Minimum viable rung-1 attempt

A ~30-line code scaffold (Euler
integration, fix all params from
carParams except C_αf, fit per platform).
The cost-to-attempt is lower than past
cohorts assumed.

Panel C · dynamics-formulations.md

The living reference.

Some references are append-only catalogues — every agent that ships a successful new formulation adds an entry. This is the artifact-level analogue of skill files as the unit of recursive self-improvement.

The reference grows as the team's vocabulary for the problem grows. Any markdown artifact in the harness can learn this way — AGENTS.md, individual skills, and references all participate in the same ratchet at different grains.

two-kpi-tradeoff.md

---
name: two-kpi-tradeoff
description: How yaw-rate RMSE and CTE
  RMSE relate. Two-step diagnostic for
  "yaw improved but CTE stuck".
when-to-load: After you have a working
  model and want to interpret your
  numbers.
---



Failure-mode index

☐ Yaw RMSE improved >30% but CTE barely
   moved → check per-platform signed bias;
   a symmetric error distribution survives
   RMSE improvements but ships as drift.

☐ Pooled score improved, per-platform
   got worse on one → you fit pooled but
   evaluated pooled; check the per-
   platform table.

☐ Dev RMSE matches train RMSE exactly
   → you split at the sample level inside
   a segment (route leakage). Re-split.

Panel D · two-kpi-tradeoff.md

The failure-mode index.

Every reference closes with a failure-mode index — a checklist of "you'll see this if…" patterns. This is the Husain pattern from production trace analysis, applied at authoring time: the moment a failure has surfaced often enough to characterise, it earns a checkbox here.

The index is what makes the reference useful at the moment of decision, not just at the moment of reading. The agent runs through it after every fit; the user does too.

Domain Knowledge.