Updated June 2026Building AI Agents with Python and MCP
Class Duration
35 hours of live training delivered over 5 days.
Student Prerequisites
- Professional Python development experience
- Familiarity with LLM APIs: prompts, tool/function calling, and structured outputs (see LLM Application Development with Python)
- Comfort with async programming (
asyncio) and Git - No prior agent framework experience required
Target Audience
Python developers ready to move beyond single request/response LLM calls to autonomous, multi-step systems. Ideal for engineers building internal agents that plan work, call tools, and act on company data and APIs - and for platform teams standardizing how agents connect to internal systems via the Model Context Protocol. Includes a dedicated module on running open-weight models locally for organizations with data residency, privacy, or AI usage restrictions.
Description
Agents are the defining LLM application pattern of 2026: systems that take a goal, plan steps, call tools, observe results, and iterate until done. This five-day intensive teaches Python developers to build them for production, starting with a from-scratch agent loop so every later abstraction is understood rather than trusted. From there the course covers the leading frameworks, LangGraph 1.x and Pydantic AI v1, then the Model Context Protocol: building, securing, and testing MCP servers in Python with the official SDK and FastMCP, and consuming them from your own agents and from coding assistants like Claude Code and Cursor. The final days cover multi-agent orchestration, human-in-the-loop design, and the evaluation discipline agents demand, then production realities: security, observability, cost control, deployment patterns, and running open-weight models locally with Ollama and vLLM for restricted environments. The course assumes working LLM API skills (covered in LLM Application Development with Python); developers who want a broad generative AI grounding first should start with Generative AI and LLMs for Python Programmers.
Learning Outcomes
- Explain the agent loop (plan, act, observe, iterate) and implement it from scratch in Python with tool dispatch, working memory, and stop conditions.
- Design tool interfaces, schemas, and error contracts that models invoke reliably, and use structured outputs for plans, routing decisions, and validated results.
- Build stateful agent workflows in LangGraph 1.x and type-safe agents with Pydantic AI v1, and choose the right framework (or plain Python) for a given problem.
- Build, secure, and test MCP servers in Python with the official SDK and FastMCP that expose internal tools, data, and APIs.
- Connect MCP servers to your own agents and to coding assistants like Claude Code and Cursor.
- Design multi-agent systems and human-in-the-loop workflows: planner/worker patterns, hand-offs, approval gates, and failure recovery.
- Evaluate agents with trajectory evals, tool-call accuracy metrics, golden tasks, and regression suites in CI, and apply prompt-injection defenses, tool sandboxing, and least-privilege credentials.
- Deploy agents as observable, cost-controlled services, and run open-weight models locally with Ollama and vLLM for restricted environments.
Training Materials
Comprehensive courseware is distributed online at the start of class. All students receive a downloadable MP4 recording of the training.
Software Requirements
Python 3.13+, an Anthropic API key (instructions for OpenAI and Gemini also provided), Ollama installed locally, VS Code or an editor of choice, Docker or Podman, and Git.
Training Topics
Environment Setup and SDK Configuration
- Python 3.13+ project setup for agent development
- Anthropic, OpenAI, and Gemini Python SDKs: installation and authentication
- Sync vs. async clients for agent workloads
- API keys, environment configuration, and secret hygiene
- Choosing models across the Claude 5/4.x, GPT-5.x, and Gemini 3.x families
- Why agents multiply token costs - and what that means for design
The Agent Loop from Scratch
- The agent loop: plan, act, observe, iterate
- Anatomy of a single turn: messages, tool definitions, and stop reasons
- The message list as agent state
- Stop conditions, budgets, and runaway prevention
- Iteration limits, timeouts, and cost ceilings
- What frameworks abstract - and what they cost you
- Tool dispatch and structured tool results
- Mapping tool calls to Python functions: registries and decorators
- Handling parallel tool calls and result ordering
- Serializing results back to the model: text, JSON, and images
- Exceptions vs. error results: what the model should see
- Concurrent tool execution with
asyncio
- Tool schemas models use reliably
- Names and descriptions as prompt engineering
- Granularity: few powerful tools vs. many narrow ones
- Error contracts and recoverable failures
- Idempotency and side-effect safety
- Pagination, truncation, and response-size limits
Structured Outputs Inside Agents
- Pydantic models as agent result contracts
- Structured plans, routing decisions, and classifications in the loop
- JSON Schema-constrained generation vs. free-form reasoning
- Validation failures and re-prompting strategies
- When to constrain output - and when to let the model write prose
Context and Memory Management
- Working memory and conversation state
- Context window budgets and compaction strategies
- Summarizing and pruning tool results mid-run
- Long-term memory: persistent stores across sessions
- Episodic memory: retrieving relevant past interactions
- Scratchpads, notes, and file-backed memory patterns
Agent Frameworks: LangGraph Core
- LangGraph 1.x: graph-structured workflows with nodes, edges, and typed state
- Graphs vs. free-running loops: when structure pays off
- State channels, reducers, and partial updates
- Checkpointing and resumable runs
- Persistence backends and thread management
- Replaying runs for debugging
LangGraph: Streaming, Interrupts, and Subgraphs
- Streaming agent progress to clients
- Stream modes: tokens, state updates, and events
- Human-in-the-loop interrupts and approvals
- Resuming interrupted runs with reviewer input
- Subgraphs: composing larger systems from smaller graphs
- Durable execution for long-running agents
Agent Frameworks: Pydantic AI
- Pydantic AI v1: type-safe agents with dependency injection
- Agents, tools, and typed run contexts
- Structured outputs as agent results
- Model-agnostic agents across providers
- Testing agents with overridden dependencies
- Fast unit tests with stub and function models
Choosing an Agent Framework
- Framework selection: LangGraph vs. Pydantic AI vs. plain Python
- Decision criteria: control, state needs, team skills, and ecosystem
- Provider agent SDKs and where they fit
- Avoiding lock-in: keeping tools and prompts portable
- Reading framework source: knowing what runs under you
Building MCP Servers in Python
- MCP architecture: hosts, clients, and servers
- Building servers with the official Python SDK and FastMCP
- Defining tools with typed parameters and clear descriptions
- Resources for read-only data and prompts for reusable templates
- Transports: stdio for local servers, streamable HTTP for remote
- Structured tool output and error reporting
Securing and Testing MCP Servers
- Authentication and credential scoping
- Authorization for streamable HTTP servers
- Testing and debugging MCP servers with the Inspector
- Unit testing tool logic as plain Python functions
- Versioning and maintaining tool contracts
- Logging and observability for server behavior
Consuming MCP from Assistants and Agents
- Connecting MCP servers to your own agents
- MCP client sessions: discovery, invocation, and lifecycle
- MCP in coding assistants: Claude Code and Cursor
- Combining multiple servers in one agent
- Tool-name collisions and selective tool exposure
- When to wrap an API in MCP - and when to call it directly
Multi-Agent Orchestration
- Planner/worker and supervisor architectures
- Agent hand-offs and shared context
- Specialist agents with scoped tools and prompts
- Parallel agents and result aggregation
- Failure recovery and partial-result handling
- When one agent with good tools beats a multi-agent system
Human-in-the-Loop Design
- Where humans belong in agent workflows
- Approval gates for destructive actions
- Pausing, resuming, and editing agent state mid-run
- Escalation paths and confidence thresholds
- Review queues and structured approval UX
Evaluating Agents
- Trajectory evals: judging the path, not just the answer
- Tool-call accuracy and efficiency metrics
- Golden tasks and regression suites in CI
- Building eval datasets from production transcripts
- LLM-as-judge for agent transcripts
- Handling nondeterminism: repeated runs and flaky tasks
Security and Guardrails
- Prompt injection via tool results and retrieved content
- The dangerous combination: private data, untrusted content, and egress
- Tool sandboxing and least-privilege credentials
- Input/output filtering and policy checks
- Approval gates for destructive actions
- Audit trails for agent actions
Local and Restricted Environments
- Open-weight models: the current landscape
- Serving locally with Ollama and vLLM
- Hardware sizing, quantization, and throughput trade-offs
- Tool calling with local models: capabilities and limits
- OpenAI-compatible endpoints: swapping backends without rewrites
- Architecting agents for data-residency and compliance constraints
Observability and Cost Control
- Tracing agent runs: spans for turns, tool calls, and model calls
- Token budgets and cost attribution
- Prompt caching for repetitive agent context
- Dashboards and alerting on cost, latency, and failure rates
- Capturing transcripts for debugging and future evals
Deployment Patterns
- Deploying agents as services
- Containerizing agent workloads
- Queues and workers for long-running tasks
- Status reporting: polling, webhooks, and progress streams
- Graceful failure handling and runaway prevention in production
- Scaling: concurrency, provider rate limits, and quotas