Claude API Pricing (2026)
| Model | Input $/1M | Output $/1M | Cache Read $/1M |
|---|---|---|---|
| Claude 3.5 Sonnet | $3.00 | $15.00 | $0.30 |
| Claude 3.5 Haiku | $0.80 | $4.00 | $0.08 |
| Claude 3 Opus | $15.00 | $75.00 | $1.50 |
| Claude 3 Sonnet | $3.00 | $15.00 | $0.30 |
| Claude 3 Haiku | $0.25 | $1.25 | $0.03 |
Prompt Caching — Claude's Biggest Cost Lever
Claude supports prompt caching for system prompts and long context. After the first request, cached tokens cost 90% less (cache read = 10% of full input price). For apps with large, repeated system prompts, this alone can cut total API costs by 50-80%.
Cache write costs 25% more than normal input. Cache entries expire after 5 minutes by default. The minimum cacheable block is 1,024 tokens for Claude 3 Sonnet/Opus and 2,048 tokens for Claude 3.5 Sonnet.
Claude 3.5 Sonnet vs Haiku: When to Use Each
Haiku is 3.75× cheaper on input and 3.75× cheaper on output than Sonnet. For straightforward tasks — extraction, classification, summarization — Haiku delivers excellent results. Use Sonnet when you need complex reasoning, nuanced writing, or coding assistance.
FAQ
How does Claude count tokens?
Claude uses a similar tokenization scheme to GPT models — roughly 4 characters or 0.75 words per token. Anthropic recommends their anthropic.tokenizer Python library for exact counts. XML-style tags used in Claude prompts also count as tokens.
What's the context window for each Claude model?
All current Claude 3 and 3.5 models support 200,000 token context windows. Claude's long context makes it well-suited for document analysis, but remember — all tokens in context are billed as input on each request.
Is there a free tier for the Anthropic API?
Anthropic offers a small free tier with rate limits through their API console. After the free tier is exhausted, all usage is billed per token. Claude.ai (the web product) has a separate free plan that doesn't apply to API usage.