Webinar · Level 03 · Domain Knowledge
Level 03
03

Domain Knowledge.

A working definition · Further reading · The reference shelf, built up · How to write one
Definition · 02

Domain Knowledge

noun /dəˈmeɪn ˈnɒl.ɪdʒ/

A short markdown document the agent loads on demand. It carries the judgement the codebase can't — the traps prior work hit, the levers worth pulling, the why behind the rule, with a worked example. Frontmatter tells the agent when to load it; the body only enters context when the moment matches. Every new failure your team sees becomes a new line.

01 judgement 02 frontmatter 03 body on demand 04 ratchet

Mitchell HashimotoMy AI Adoption Journey, Feb 2026 (the ratchet method);
Sean GroveThe New Code, AI Engineer World's Fair 2025 (spec as lossless source of intent);
BettaTech¿Qué es esto del Harness Engineering?, 2026 (guides vs sensors).

Diagram · 03 · The reference shelf, built up 01 / 06
Human LLM Call System Prompt skills/ fit-model score-model compare-models residual-structure route-bias Environment data code Action tools Feedback Stop references/ anti-patterns approach-menu two-kpi-tradeoff exploration-disc dynamics-form ceiling-moves metadata · every turn body · on demand a new line per failure

 

Reference · Further reading 01 / 01
Mitchell Hashimoto — My AI Adoption Journey
Essay · February 2026

My AI Adoption Journey

Mitchell Hashimoto — HashiCorp co-founder, mitchellh.com.
Synthesised into the harness engineering vocabulary by BettaTech (Spanish-language YouTube, late April 2026).

  • The ratchet method — every agent mistake gets engineered out structurally: in AGENTS.md, in a reference doc, in a tool wrapper. Never just re-prompted away. Failures are inputs to the next iteration.
  • "AGENTS.md is the changelog of every mistake your agents have made, written in the imperative." The same loop applies to every markdown artifact in the harness — including the domain-knowledge references.
Read the essay
Anatomy · How to write a reference 01 / 04
anti-patterns.md
---
name: anti-patterns
description: Common ways prior work on this
  task has gone wrong. Lead with these —
  most of them are not obvious from the
  data alone.
when-to-load: Before you settle on a fitting
  procedure or evaluation slice.
load-cost: ~600 words.
---



The legal cousin — per-segment δ₀ from input channels (this is THE winning move on the right platforms)

This is the single highest-leverage move on this dataset. In the most recent m3 cohort, the three top-tier agents all shipped it; the three bottom-tier agents all didn't — and the gap was +8 pts yaw / +15 pts CTE between tiers, with model form otherwise identical.
Panel A · anti-patterns.md

The description carries judgement, not mechanics.

The frontmatter names the role of the doc, not its contents"common ways prior work has gone wrong." The body opens with a worked example whose first paragraph is cohort evidence: not a principle, an outcome with numbers attached.

Grove's framing applies here: the reference is the lossless source of judgement. The code that implements δ₀ correction is a lossy projection of the insight that bottom-tier agents reliably miss it. The reference carries the insight; the code carries the implementation.

exploration-discipline.md
---
name: exploration-discipline
description: Protocol for naming ≥5
  alternatives (at least 3 different model
  structures) before committing to one,
  plus the EXPERIMENTS.md log convention.
  Prevents silent re-convergence on the
  same approach prior cohorts piled up on.
when-to-load: At the start of a fresh task,
  before your first fit. Re-read whenever
  you're tempted to "just iterate on the
  current model".
---



Every EXPERIMENTS.md entry MUST carry a
Rung: 0|1|2|3|orthogonal tag.

The pre-flighting-final-model skill
enforces at least one Rung: 1+ or
Rung: orthogonal entry before the bundle
can ship.
Panel B · exploration-discipline.md

The reference as protocol.

A reference doesn't have to teach — it can prescribe. This one is a procedure: name five alternatives, log them, tag the rung, and the harness will refuse to ship if you skipped the climb.

That last sentence is the ratchet in action — a prior cohort failed (every agent piled up on rung-0 refinements), so the harness was modified to prevent that failure from recurring. The reference doc is the human-readable face of the same change. References and skills co-evolve with the failures they exist to prevent.

dynamics-formulations.md
---
name: dynamics-formulations
description: V0 documented in full plus
  sketches of higher-rung formulations
  (linear dynamic ST with slip angles,
  nonlinear tyre, multi-body).
when-to-load: When choosing a model
  structure, or when residual-structure
  flags `structure_detected`.

Living doc — append your formulation
here when you ship one past V0.
---



Minimum viable rung-1 attempt

A ~30-line code scaffold (Euler integration, fix all params from carParams except C_αf, fit per platform). The cost-to-attempt is lower than past cohorts assumed.
Panel C · dynamics-formulations.md

The living reference.

Some references are append-only catalogues — every agent that ships a successful new formulation adds an entry. This is the artifact-level analogue of skill files as the unit of recursive self-improvement.

The reference grows as the team's vocabulary for the problem grows. Any markdown artifact in the harness can learn this way — AGENTS.md, individual skills, and references all participate in the same ratchet at different grains.

two-kpi-tradeoff.md
---
name: two-kpi-tradeoff
description: How yaw-rate RMSE and CTE
  RMSE relate. Two-step diagnostic for
  "yaw improved but CTE stuck".
when-to-load: After you have a working
  model and want to interpret your
  numbers.
---



Failure-mode index

Yaw RMSE improved >30% but CTE barely moved → check per-platform signed bias; a symmetric error distribution survives RMSE improvements but ships as drift. Pooled score improved, per-platform got worse on one → you fit pooled but evaluated pooled; check the per- platform table. Dev RMSE matches train RMSE exactly → you split at the sample level inside a segment (route leakage). Re-split.
Panel D · two-kpi-tradeoff.md

The failure-mode index.

Every reference closes with a failure-mode index — a checklist of "you'll see this if…" patterns. This is the Husain pattern from production trace analysis, applied at authoring time: the moment a failure has surfaced often enough to characterise, it earns a checkbox here.

The index is what makes the reference useful at the moment of decision, not just at the moment of reading. The agent runs through it after every fit; the user does too.

← → navigate