Agents — From Prompt to Reasoning Loop

Lifetime access

Module 2 of the AI Engineer path: build agents that reason before they act. Learn the 5 questions every modern agent asks itself, then ship a tool-using loop with Claude.

What you'll build

Build reasoning loops with Claude & Tool use
Multi-step workflows & Failure handling
Self-correction & Agentic memory systems
MCP integration for universal tool access

Lessons

From prompt to reasoning loop — the 5 questions every agent asks
Understand what modern AI agents actually do between receiving a prompt and returning a result. Build a Claude-powered agent that reasons through the 5 phases — intent, decomposition, tools, constraints, self-critique — and uses tools to answer a real question.
Tool design — strict schemas, descriptions that teach, and safe error recovery
A tool description is a prompt. Learn to write tool definitions Claude can actually use reliably: detailed descriptions, strict JSON schemas with grammar-constrained sampling, and defensive wrappers that let the model recover from validation failures on the next turn.
Extended thinking — give Claude a scratchpad, and preserve it through tool calls
Turn on adaptive thinking to let Claude reason before answering — and learn the single rule that breaks 90% of first implementations: thinking blocks must round-trip unchanged through a tool-use loop, or the API rejects the next turn.
Memory and context — let Claude write to a scratchpad and read it back next session
Implement a client-side memory store Claude can read and write through the `memory_20250818` tool. Includes the six commands (view, create, str_replace, insert, delete, rename), the path-traversal attack every tutorial forgets to prevent, and when to pair memory with context editing for long-running workflows.
MCP integration — connect your agent to tools other people already built
Model Context Protocol is a wire protocol that standardises how agents talk to external tool servers. Install one MCP server, write a minimal TypeScript client, and let Claude delegate file operations to it. Learn when MCP pays for itself versus when hand-rolled tools are simpler.
Multi-agent systems — orchestrators, specialists, and the pull to keep it simple
Move from single-agent loops to orchestrator-plus-specialists. Build a planner → coder → reviewer pipeline, measure it against a single-agent baseline, and learn Anthropic's explicit rule: only add multi-agent complexity when it demonstrably improves outcomes.
Evaluating agents — task correctness, trajectory sanity, and LLM-as-judge without the judge bias
Build a Promptfoo eval harness that scores your multi-agent pipeline against canned tasks. Learn why trajectory evals matter as much as final-answer evals, how to design LLM-as-judge rubrics that don't reward verbose fluff, and when programmatic assertions beat any model-based grading.
Observability, cost guardrails, and safety — the production-readiness checklist
Wire Langfuse tracing into the multi-agent pipeline, add per-tool call quotas and a cumulative-token circuit breaker, and defend against the OWASP-LLM agent threats (prompt injection, excessive agency, insecure tool design). Finish with the production checklist you should tick off before any agent goes live.
Agent security — the threat model end-to-end
Security isn't a step at the end of the pipeline — it's a property of every arrow in the loop. Map the three attack surfaces web developers don't expect, build an adversary lab against a vulnerable agent, then add four layers of defence and watch the verdicts flip.
Fine-tuning vs prompting — the eval-driven decision tree
Fine-tuning is powerful, expensive, and usually premature. Learn the real decision order — prompt, few-shot, RAG, then tune — and build the dataset validation and eval discipline that must exist before any tuning job is allowed to start.
Debugging agents — from trace to eval
Agents fail in four recognisable shapes — and prompt-whispering fixes none of them. Learn to read a trace, triage the failure, isolate and replay the failing turn with one variable changed, then promote the fix into a permanent trajectory eval so the same bug cannot return silently.
Error recovery at scale — retries, fallbacks, circuit breakers, and graceful degradation
Production agents do not just fail — they classify failures, retry the right ones, degrade the right ones, and stop when recovery is no longer safe. Build a recovery policy that treats rate limits, tool failures, malformed outputs, and exhausted budgets differently.
Agent runtime architecture — queues, sessions, lanes, and control planes
Move one layer above the prompt loop. Build the mental model for stateful agent runtimes: ingress, lane-aware queues, warm sessions, task packets, control-plane storage, and the difference between a stateless agent call and a long-running agent system.
Autonomous worker lifecycle — trust gates, resumability, and recovery recipes
Teach autonomy as a state machine, not a vibe. Model worker boot, trust gates, resumable execution, degraded plugins, stale-task detection, and one-shot recovery recipes so an autonomous agent knows when to continue, when to escalate, and when to stop.
Case study — build an agent like Claude Code
Compose the module into one recognisable system: a Claude Code-like coding agent with scoped memory, worktree-isolated workers, governed delegation, runtime state, and recovery-aware execution.
Failure Handling & Self-Correction
Agents fail. They call tools with wrong arguments, hit rate limits, or loop indefinitely. Learn how to build self-correcting reasoning loops.
Orchestration Frameworks
Moving beyond simple loops. Learn how to use frameworks like LangGraph or Temporal to build complex, reliable agentic workflows.

$9.99 one-time