Claude API Pricing (2026)

Model	Input $/1M	Output $/1M	Cache Read $/1M
Claude 3.5 Sonnet	$3.00	$15.00	$0.30
Claude 3.5 Haiku	$0.80	$4.00	$0.08
Claude 3 Opus	$15.00	$75.00	$1.50
Claude 3 Sonnet	$3.00	$15.00	$0.30
Claude 3 Haiku	$0.25	$1.25	$0.03

Prompt Caching — Claude's Biggest Cost Lever

Claude supports prompt caching for system prompts and long context. After the first request, cached tokens cost 90% less (cache read = 10% of full input price). For apps with large, repeated system prompts, this alone can cut total API costs by 50-80%.

Cache write costs 25% more than normal input. Cache entries expire after 5 minutes by default. The minimum cacheable block is 1,024 tokens for Claude 3 Sonnet/Opus and 2,048 tokens for Claude 3.5 Sonnet.

Claude 3.5 Sonnet vs Haiku: When to Use Each

Haiku is 3.75× cheaper on input and 3.75× cheaper on output than Sonnet. For straightforward tasks — extraction, classification, summarization — Haiku delivers excellent results. Use Sonnet when you need complex reasoning, nuanced writing, or coding assistance.

FAQ

How does Claude count tokens?

Claude uses a similar tokenization scheme to GPT models — roughly 4 characters or 0.75 words per token. Anthropic recommends their anthropic.tokenizer Python library for exact counts. XML-style tags used in Claude prompts also count as tokens.

What's the context window for each Claude model?

All current Claude 3 and 3.5 models support 200,000 token context windows. Claude's long context makes it well-suited for document analysis, but remember — all tokens in context are billed as input on each request.

Is there a free tier for the Anthropic API?

Anthropic offers a small free tier with rate limits through their API console. After the free tier is exhausted, all usage is billed per token. Claude.ai (the web product) has a separate free plan that doesn't apply to API usage.

Cache-hit pricing check

In 2026, a common budgeting mistake is treating all prompt tokens as full-price input. For Claude workloads with stable system prompts, policy blocks, tool schemas, or long retrieved context, prompt caching can dramatically change effective cost—but only if your calculator models cache writes and cache reads separately. Before trusting the estimate, verify whether llmcost.xyz lets you split first-request cost from repeated-request cost. If your app reuses the same 20k-token prefix across sessions, the difference between uncached and cache-hit pricing can be the biggest line item in your monthly total.

Claude API Cost Calculator

Select Claude Model

Your Usage

Cost Estimate

Claude API Pricing (2026)

Prompt Caching — Claude's Biggest Cost Lever

Claude 3.5 Sonnet vs Haiku: When to Use Each

FAQ