Claude Agent SDK with ZenMux: Cut Agent Loop Costs, Keep Loops Running

TL;DR. The Claude Agent SDK is Anthropic's headless framework for building autonomous AI agents in Python and TypeScript. To run it through ZenMux, point the SDK at ZenMux's Anthropic-compatible endpoint with two environment variables:
Shell
export ANTHROPIC_BASE_URL="https://zenmux.ai/api/anthropic"
export ANTHROPIC_AUTH_TOKEN="sk-ai-v1-your-key-here"
Your existing Claude Agent SDK code keeps working without changes. You instantly get 200+ models behind one API key, can route different phases of your agent loop to lower-cost models like DeepSeek V3.2 or Qwen3-Coder, and benefit from automatic fallback when a provider is degraded. Get a free ZenMux API key → (no credit card required)

What is the Claude Agent SDK?

The Claude Agent SDK is Anthropic's official framework for building autonomous AI agents. It packages the same agent loop, tool execution, context management, and subagent system that powers Claude Code, exposed as a programmable Python and TypeScript library. You write a high-level task in natural language and the SDK handles tool calls, file edits, bash commands, web search, multi-step planning, and recovery on its own — until the task is done.

It is distinct from the lower-level Anthropic Client SDK (pip install anthropic). With the Client SDK you call messages.create() and implement the tool loop yourself. With the Agent SDK you call query() or instantiate ClaudeSDKClient, and the loop is built in. As of June 2026, the Agent SDK runs on Claude Sonnet 4.6, Opus 4.7, and Opus 4.8.

Starting June 15, 2026, Anthropic meters Agent SDK and claude -p headless usage separately from interactive Claude Code, drawing from a dedicated monthly Agent SDK credit on subscription plans. That makes agent-loop spend visible in a way it wasn't before — and makes the cost of every loop iteration matter.

Why Route the Claude Agent SDK Through ZenMux?

A single agent task can fan out into dozens of model calls. A long-horizon refactor or research agent can easily fire 30–100 API requests in a single run: one or two big planning calls, then a long tail of small reads, edits, lookups, and verification steps. The cost of an agent system isn't the first call — it's the loop.

That changes the question you ask about model cost. With one-shot chat you optimize the model for the answer. With agent loops you optimize the model for the phase. ZenMux lets you do that without rewriting your agent.

1. Reduce agent-loop cost by routing phases to the right model

Most loop calls are cheap in cognitive load and expensive in volume: read a file, summarize a function, check whether a test passes, decide which tool to call next. These do not need a frontier-class model. Routing the execution and verification phases of a loop to a cost-efficient model — while keeping planning on Claude Opus or Sonnet — can take agent-loop spend down by an order of magnitude.

A concrete example. Suppose your loop averages 100K input tokens and 50K output tokens per iteration, and you run 50 iterations to complete one task:

Model	Input ($/M)	Output ($/M)	Cost per iter	Cost for 50 iters
`anthropic/claude-opus-4.5`	$5.00	$25.00	$1.75	$87.50
`anthropic/claude-sonnet-4.5`	$3.00	$15.00	$1.05	$52.50
`deepseek/deepseek-v3.2`	$0.28	$0.43	$0.05	$2.55
`z-ai/glm-4.5`	$0.35-0.56	$1.54-2.25	≈$0.13	≈$6.50
`qwen/qwen3-coder-plus`	$1.00-6.00	$5.00-60.00	varies (tiered by context)	—

Pricing is the published per-million-token rate on zenmux.ai/models on 2026-06-11. Sonnet and a few other models use tiered pricing that scales with prompt length; the table shows the headline rate.

Even keeping the planning phase on Claude Opus 4.5 ($1.75) and only routing the remaining 49 execution iterations through DeepSeek V3.2 ($0.05 each), the loop comes out at roughly $4.20 instead of $87.50 — a ~20× reduction with no change to the SDK code, only the model selection at each step.

2. Keep loops running with automatic fallback

Agent loops are stateful. A degraded provider mid-loop is worse than a one-shot API failure: you lose partial context, your loop's recovery logic has to retry from a known-good checkpoint, and the user is left watching a spinner. Anthropic's API had several elevated-error and degraded-throughput windows in 2025–2026, and the only way to ride those out from inside an Agent SDK loop is to fail over to a different provider — but the SDK has no built-in way to do that on its own.

ZenMux handles this at the API layer. When the primary route for a model is unhealthy, requests are retried against a secondary route automatically, transparently to the SDK. Your loop sees a successful response and keeps going. You do not have to write retry logic, you do not have to wrap calls in try/except for provider-specific error codes, and you do not have to rebuild the agent's mid-loop state on a transient outage.

3. Use different models for different loop phases — without rewriting your agent

Anthropic Claude Code (and the Agent SDK that's built on the same harness) ships with three model tiers — Haiku, Sonnet, Opus — used for fast/balanced/powerful work respectively. Through ZenMux, you can remap any of those tiers to any model in the catalogue via three environment variables:

Shell

# Keep Claude on the powerful tier, but use cheaper models for the fast and balanced tiers
export ANTHROPIC_DEFAULT_OPUS_MODEL="claude-opus-4-7"        # planning
export ANTHROPIC_DEFAULT_SONNET_MODEL="deepseek/deepseek-v3.2"  # execution
export ANTHROPIC_DEFAULT_HAIKU_MODEL="qwen/qwen3-coder-plus"     # fast tools

This works because the Claude Agent SDK uses tier names internally to decide which model to call for which kind of step. Remap the tier, and the loop transparently calls a cheaper model — no SDK code changes, no new abstractions. You can also keep all three on Claude aliases (claude-haiku-4-5, claude-sonnet-4-6, claude-opus-4-7) to preserve native features like the 1M-token context window.

Quick Start: Claude Agent SDK + ZenMux

The setup is four steps. The whole thing takes about five minutes.

Step 1: Install the Claude Agent SDK

The SDK is a separate package from the lower-level anthropic client.

pip install claude-agent-sdk

The package bundles the Claude Code CLI under the hood — you do not need to install @anthropic-ai/claude-code separately. Node.js 18+ must be available on the machine so the SDK can spawn the CLI subprocess.

Step 2: Get a ZenMux API key

Sign up at zenmux.ai, then create a Pay-As-You-Go key on the PAYG page. No credit card is required to register, and free-tier models are available immediately. PAYG keys start with sk-ai-v1-…. Subscription keys (for the Builder Plan) start with sk-ss-v1-… — either format works with the Agent SDK.

Step 3: Configure the environment

Shell

# ZenMux endpoint and authentication
export ANTHROPIC_BASE_URL="https://zenmux.ai/api/anthropic"
export ANTHROPIC_AUTH_TOKEN="sk-ai-v1-your-key-here"

# Clear any direct-Anthropic key so it doesn't conflict
export ANTHROPIC_API_KEY=""

Add these lines to ~/.zshrc (or ~/.bashrc) and run source ~/.zshrc so every new shell picks them up. Two gotchas worth knowing:

The SDK reads ANTHROPIC_BASE_URL once, when the agent process starts. Change the variable after that and nothing happens — requests keep going to the old endpoint. To switch providers mid-session, exit the agent process and start a new one.
macOS GUI apps don't inherit shell env vars. If you launch a Python script from a .app bundle, none of your ~/.zshrc exports reach it. Set environment variables programmatically inside the script (with os.environ) when running outside a terminal.

Step 4: Run your first agent loop

The simplest path is the query() function — a one-shot agent that handles the full tool loop and yields messages back to you as an async iterator:

Python

import anyio
from claude_agent_sdk import query, ClaudeAgentOptions

async def main():
    options = ClaudeAgentOptions(
        system_prompt="You are a careful Python developer.",
        permission_mode="acceptEdits",
        cwd="/Users/me/project",
    )

    async for message in query(
        prompt="Read main.py, find any TODOs, and write a Markdown checklist of them in todo.md.",
        options=options,
    ):
        print(message)

anyio.run(main)

That's the full loop: the SDK plans the work, reads main.py, scans for TODOs, writes todo.md, and yields each tool call and response back to your loop as it goes. The model selection is governed by your ANTHROPIC_DEFAULT_*_MODEL env vars (or the SDK's defaults if you don't set them), so the same Python code runs against ZenMux-routed models without any code change.

Step 5: Multi-turn conversations with `ClaudeSDKClient`

When the loop needs to be interactive — gathering feedback, accepting follow-up tasks, or running custom tools — use ClaudeSDKClient instead. It keeps conversation state across turns:

Python

import asyncio
from claude_agent_sdk import (
    ClaudeSDKClient, ClaudeAgentOptions,
    AssistantMessage, TextBlock,
)

async def main():
    options = ClaudeAgentOptions(
        system_prompt="You are an expert refactoring assistant.",
        permission_mode="acceptEdits",
    )

    async with ClaudeSDKClient(options=options) as client:
        await client.query("Find duplicated helper functions in src/ and list them.")
        async for msg in client.receive_response():
            if isinstance(msg, AssistantMessage):
                for block in msg.content:
                    if isinstance(block, TextBlock):
                        print(block.text)

        await client.query("Now consolidate the top three into a single utility module.")
        async for msg in client.receive_response():
            if isinstance(msg, AssistantMessage):
                for block in msg.content:
                    if isinstance(block, TextBlock):
                        print(block.text)

asyncio.run(main())

The follow-up query() call has full context of the first turn — the SDK is maintaining the conversation, the tool-use state, and the file context for you. Through ZenMux, every model call inside both turns is routed against the model you've chosen via the tier env vars.

Pattern: Different Models for Different Phases of a Loop

The "one big model for everything" pattern is the most expensive way to run an agent. Loops do five distinct kinds of work, and they don't all need the same intelligence:

Phase	What it does	Good fit
Planning	Decompose the task, pick tools, set the sequence	Claude Opus 4.7, GPT-5.2, Gemini 3 Pro
Code generation	Write the diff, propose a refactor	Claude Sonnet 4.6, Qwen3-Coder-Plus
Execution / tools	Run bash, read files, parse outputs	DeepSeek V3.2, GLM-4.5, Claude Haiku 4.5
Verification	Check tests pass, validate diffs	DeepSeek V3.2, Claude Haiku 4.5
Summarization	Recap state back to the user	Kimi-K2-Thinking, Gemini 2.5 Flash

The Claude Agent SDK tier remap from earlier is the simplest way to apply this — Opus for the powerful tier, a small/fast model for Haiku, and a code-specialist model for Sonnet. You can also bypass the SDK entirely for one specific call (using the lower-level Anthropic client against the same ZenMux endpoint) when you need fine-grained control:

Python

from anthropic import Anthropic

# Same endpoint, same auth, used directly for one targeted call outside the agent loop
client = Anthropic(
    base_url="https://zenmux.ai/api/anthropic",
    api_key="sk-ai-v1-your-key-here",
)

reply = client.messages.create(
    model="deepseek/deepseek-v3.2",
    max_tokens=512,
    messages=[{"role": "user", "content": "Summarize this stack trace in 3 bullets: ..."}],
)
print(reply.content[0].text)

That call uses the Anthropic Messages protocol against ZenMux, with a non-Claude model on the other side — DeepSeek V3.2 in this case. Model selection is provider-prefixed (provider/model-name), so you can target any model in the ZenMux catalogue without changing the request shape.

Cost Comparison: Direct Anthropic vs. ZenMux Routing Options

ZenMux passes through the upstream provider's published rates — using Claude through ZenMux is the same per-token price as using Claude direct, and the value comes from access, fallback, and routing rather than discounting Claude. The savings show up when you route phases of a loop to a different model:

Dimension	Anthropic Direct	ZenMux + Claude Opus 4.5	ZenMux + DeepSeek V3.2	ZenMux + Qwen3-Coder
Input price / 1M tokens	$5.00 (Opus 4.5)	$5.00	$0.28	$1.00–6.00 (tiered)
Output price / 1M tokens	$25.00 (Opus 4.5)	$25.00	$0.43	$5.00–60.00 (tiered)
Credit card required to start	Yes	No	No	No
Models accessible behind same key	Claude only	200+	200+	200+
Automatic fallback	No	Yes	Yes	Yes
Tier remap via env vars	No	Yes	Yes	Yes

Numbers are the public per-million-token rates listed on zenmux.ai/models on 2026-06-11. ZenMux's PAYG sign-up does not require a credit card and includes free-tier models to start.

Available Models for Agent Loops on ZenMux

The catalogue covers more than 200 models. The ones most commonly used inside Claude Agent SDK loops:

Frontier models (planning / high-stakes reasoning):

anthropic/claude-opus-4.7, anthropic/claude-opus-4.5, anthropic/claude-sonnet-4.6, anthropic/claude-sonnet-4.5
openai/gpt-5.2, openai/gpt-5.2-pro, openai/gpt-5
google/gemini-3-pro-preview, google/gemini-2.5-pro

Cost-efficient models (execution / verification / fast tools):

anthropic/claude-haiku-4.5 — $1.00 in / $5.00 out
deepseek/deepseek-v3.2 — $0.28 in / $0.43 out (one of the cheapest capable models in the catalogue)
deepseek/deepseek-chat-v3.1 — $0.28 in / $1.11 out
z-ai/glm-4.5 — $0.35–0.56 in / $1.54–2.25 out (strong agent benchmarks)
qwen/qwen3-coder-plus — code-specialist, tool-calling tuned
moonshotai/kimi-k2-thinking — $0.60 in / $2.50 out, long context

Browse the full list at zenmux.ai/models. Every model in the catalogue accepts the same API key, the same protocol, and the same SDK code.

FAQ

What is the Claude Agent SDK?

It is Anthropic's official Python and TypeScript framework for building autonomous AI agents. It packages the agent loop, tool execution, file editing, bash, web search, and subagent capabilities that power Claude Code, exposed as a programmable SDK so you can build your own agents on the same harness. See the official documentation.

How do I use the Claude Agent SDK with a custom API provider like ZenMux?

Set two environment variables before launching your agent process:

Shell

export ANTHROPIC_BASE_URL="https://zenmux.ai/api/anthropic"
export ANTHROPIC_AUTH_TOKEN="your-zenmux-api-key"

Your existing Claude Agent SDK code works without changes — the SDK reads the variables at startup and routes all subsequent calls through your chosen endpoint.

Does changing `ANTHROPIC_BASE_URL` break my existing Claude Agent SDK code?

No. The SDK uses the Anthropic Messages protocol, and ZenMux exposes an Anthropic-compatible endpoint that accepts the same request shape, headers, and streaming format. Your query(), ClaudeSDKClient, and ClaudeAgentOptions calls all behave identically; only the upstream destination changes.

How much does a typical agent loop cost with ZenMux vs. direct Anthropic?

If the loop runs entirely on Claude Opus through ZenMux, the cost matches the published Anthropic rate — ZenMux passes through provider pricing. The savings come from selectively routing phases of the loop to cheaper models. A 50-iteration loop averaging 100K in / 50K out per step costs about $87.50 on Claude Opus 4.5. Routing the 49 execution steps through DeepSeek V3.2 brings the same loop down to roughly $4.20 — a ~20× reduction with no SDK code change.

Which model should I use for agent-loop planning vs. execution?

Use a frontier model (Claude Opus 4.7, GPT-5.2, Gemini 3 Pro) for the planning phase, where reasoning quality directly shapes how the rest of the loop runs. Use a cost-efficient model (DeepSeek V3.2, Claude Haiku 4.5, GLM-4.5) for execution, file reads, tool calls, and verification — these phases are about throughput and reliability, not novel reasoning. The Claude Agent SDK's ANTHROPIC_DEFAULT_OPUS_MODEL / ANTHROPIC_DEFAULT_SONNET_MODEL / ANTHROPIC_DEFAULT_HAIKU_MODEL environment variables let you remap each tier independently.

Does ZenMux support Claude Agent SDK streaming, tool use, and custom MCP servers?

Yes. The Anthropic-compatible endpoint forwards streaming responses, tool-use blocks, and MCP tool invocations unchanged. create_sdk_mcp_server, @tool-decorated Python tools, and mcp_servers in ClaudeAgentOptions all behave the same as against Anthropic direct.

What about the new Agent SDK metering that starts June 15, 2026?

Anthropic is metering Agent SDK usage on subscription plans separately from interactive Claude Code as of 2026-06-15. Routing your Agent SDK loops through ZenMux means each call is billed at the per-token rate of the model you chose (Claude or otherwise) under your ZenMux account, independent of Anthropic's subscription Agent SDK credits. For teams running heavy agent workloads, this typically gives a more predictable cost profile than absorbing per-token overages on the subscription side.

Can I still use the lower-level `anthropic` Python client alongside the Agent SDK?

Yes. The anthropic client (pip install anthropic) accepts base_url and api_key directly:

Python

from anthropic import Anthropic
client = Anthropic(
    base_url="https://zenmux.ai/api/anthropic",
    api_key="sk-ai-v1-your-key-here",
)

This is useful when you want a one-shot, tightly-scoped call (e.g. a summarization step) on a specific model, outside the Agent SDK's autonomous loop. Both SDKs can coexist in the same project, hitting the same ZenMux endpoint.

Start in Five Minutes

Pointing the Claude Agent SDK at ZenMux is a one-time, two-line environment-variable change. Your existing agent code runs unchanged. You get 200+ models behind one API key, automatic fallback for provider degradation, and the ability to route different phases of a loop to the right-cost model — turning what used to be a $87 loop into a $4 loop with no rewrite.

Get a free ZenMux API key → (no credit card required)

Claude Agent SDK with ZenMux: Cut Agent Loop Costs, Keep Loops Running

What is the Claude Agent SDK?

Why Route the Claude Agent SDK Through ZenMux?

1. Reduce agent-loop cost by routing phases to the right model

2. Keep loops running with automatic fallback

3. Use different models for different loop phases — without rewriting your agent

Quick Start: Claude Agent SDK + ZenMux

Step 1: Install the Claude Agent SDK

Step 2: Get a ZenMux API key

Step 3: Configure the environment

Step 4: Run your first agent loop

Step 5: Multi-turn conversations with ClaudeSDKClient

Pattern: Different Models for Different Phases of a Loop

Cost Comparison: Direct Anthropic vs. ZenMux Routing Options

Available Models for Agent Loops on ZenMux

FAQ

What is the Claude Agent SDK?

How do I use the Claude Agent SDK with a custom API provider like ZenMux?

Does changing ANTHROPIC_BASE_URL break my existing Claude Agent SDK code?

How much does a typical agent loop cost with ZenMux vs. direct Anthropic?

Which model should I use for agent-loop planning vs. execution?

Does ZenMux support Claude Agent SDK streaming, tool use, and custom MCP servers?

What about the new Agent SDK metering that starts June 15, 2026?

Can I still use the lower-level anthropic Python client alongside the Agent SDK?

Start in Five Minutes

Step 5: Multi-turn conversations with `ClaudeSDKClient`

Does changing `ANTHROPIC_BASE_URL` break my existing Claude Agent SDK code?

Can I still use the lower-level `anthropic` Python client alongside the Agent SDK?