A folder the agent can load on demand. Inside: metadata that says when to use it, instructions that say how, scripts that do the work, and any assets the work needs. ⌁The agent reads the metadata at startup; the rest only enters context when the skill is judged relevant.
Anthropic Engineering — Equipping Agents for the Real World with Agent Skills, Dec 2025.
See also Zhang & Murag — Don't Build Agents, Build Skills Instead, AI Engineer Code Summit, late 2025.
Barry Zhang & Mahesh Murag — Anthropic
Sean Grove — then OpenAI alignment
--- name: residual-structure description: After a fit, characterise what's LEFT in the residual — temporal autocorrelation at multiple lags, Pearson correlation with each input feature AND its first time-derivative, sign-asymmetry in δ. Returns a per-platform verdict — either "noise_floor" (stop; you're done) or "structure_detected" with a specific reason ("residual autocorrelated at lag 6 → try a τ·d(δ)/dt term"). Use as the bridge between fit-model and "is V2 worth building?". This is the diagnostic the v2 cohort silently lacked — almost everyone shipped V1 understeer; the one agent who didn't (m2-agent-05, +51.5% yaw) saw exactly this autocorrelation signature and added a steering-rate lead. when-to-invoke: After running `fit-model` and `score-model`, when you are trying to decide whether your current model has more headroom or you are at the noise floor. Especially when yaw RMSE has stalled and you do not know whether to ship or keep iterating. when-NOT-to-invoke: Before any fit (run scoring-model first — you need a fitted predict_fn). To see route-level bias (use route-bias). To plot residual vs one feature (use inspect-residuals). load-cost: ~210 tokens metadata, ~500 tokens body. ---
← Click a highlighted YAML field
Names the verdict the skill produces — and the failure mode it prevents — not the function it calls. Where your organisation's expertise lives.
Three explicit redirects. Without them the agent guesses which skill to load; with them, it routes.
Metadata is paid every turn; body only when this skill activates. Without the receipt, you can't budget the eleventh skill.
Runs any predict() over the segments and returns one structured bundle — pooled yaw + CTE RMSE, per-segment / per-platform / per-route / per-regime tables, residual stats, worst-N outliers, and a signed-bias warning block at the top. Schema-aware: resolves the ground-truth column per platform, so Tesla (and any non-default schema) scores instead of being silently skipped.
Maps the workspace — skills/, _shared/, data/, code/, final-model/. So the agent knows where things live before it touches anything.
One paragraph per skill — what it returns, what to read first, the cohort lesson baked in. The agent learns the toolkit without opening ten SKILL.md files.
Score → fit → diagnose → iterate the model, not the fit. Encodes how skills chain — the workflow the cohort discovered, frozen as instructions.
The specific failure mode that ceiling-ed the v2 cohort. "After your first fit, run residual-structure and build a second candidate." Hard-won lesson, written down once.
"Skills are clay, not library." If a skill is in the way — delete it. The only obligation is to lower the canonical KPIs.
AGENTS.md is the project's orientation file — the agent reads it once at startup, before any skill metadata is even scanned. It's the manifest that turns a folder into a workspace.
Loaded every turn — unlike skill bodies. So it has to be tight. Anything you put here is paid for forever; anything you leave for a SKILL.md body only enters when needed.
The agent doesn't have to ls skills/ and guess. AGENTS.md lists every skill with its judgement — same writing discipline as a SKILL.md description, just one level up.
Tells the agent how the skills chain — score → fit → diagnose → iterate. Without this, the agent picks an arbitrary order and burns turns rediscovering the workflow.
"Don't ship V1" is not advice — it's a recorded failure mode. AGENTS.md is where the cohort's institutional memory lives, so the next agent doesn't repeat the same ceiling.
Skills are clay, not library. The agent has permission to modify, extend, or delete any skill. The only obligation is the KPI. AGENTS.md is where that permission is granted.