Agent Engineering Lab · 12 modules
Twelve labs, one engineering story
Each lab is a focused playground for one concept in production AI agent engineering. Seven share the canonical credit-order-eligibility scenario at /tools/agent-lab (rendered as different lenses, deep-linked from the cards below); four are dedicated routes for material that needed its own surface area. One is documentation only because the topic is operational, not visual.
- 5 labs
Foundations
Messages, schemas, tools, the loop itself, and retrieval.
Lab 01dedicated LLM API fundamentals
See the request/response loop the chat box hides.
Two side-by-side prompt configs with token usage, latency, and cost.
OpenLab 02canonical lab Structured outputs
Make model output machine-readable and validate it.
Structured Output lens: Zod schemas, validation errors, force-invalid toggle.
OpenLab 03canonical lab Tool calling
See typed tool boundaries and runtime calls.
Tool Calling lens: registry, args, results, all schema-validated.
OpenLab 04canonical lab Agent loop
Watch a loop iterate, decide, and stop.
Agent Loop lens: per-iteration view of the runner over the credit scenario.
OpenLab 05canonical lab RAG
Compare uncited vs. retrieval-grounded answers.
RAG lens: 8-section policy doc, deterministic keyword retrieval, citations.
Open
- 1 lab
Retrieval
Why a single retrieval signal is rarely enough.
- 2 labs
Tooling
Tool discovery and the workflow / agent boundary.
Lab 07dedicated MCP-style tool protocol
Discover tools through a standard manifest.
Tool registry, JSON-schema inputs, permission badges, simulated handshake.
OpenLab 08dedicated Workflow vs free-form agent
Compare a fixed pipeline to an autonomous loop.
Six-step deterministic workflow next to the free-form agent runner.
Open
- 1 lab
Quality
Replay, score, and regression-test agent behavior.
- 2 labs
Safety
Human approval and policy outside the model.
Lab 10canonical lab Human-in-the-loop
Classify actions by risk and approve / reject.
Built into every run: write actions pause at the approval gate; rejection is first-class.
OpenLab 11canonical lab Permissions and audit trails
Keep policy outside the model and log every decision.
Policy module + trace timeline: every iteration is a typed, inspectable event.
Open
- 1 lab
Operations
What production deployment and observability look like.