## What Is Prompt Caching?
Prompt caching stores processed prompt prefixes so they don't need to be re-computed on subsequent requests, dramatically reducing costs and latency.
### Why Caching Matters
AI API costs scale with token count. Common patterns like system prompts, few-shot examples, and document context are repeated across requests — caching eliminates this redundancy.
### How It Works
- First request: Full prompt is processed and cached
- Subsequent requests: Cached prefix is reused, only new tokens are processed
- Cache hit: Matching prefix found → reduced cost and latency
- Cache miss: No match → full processing (cache is created)
### Provider Comparison
| Provider | Cache Type | Min Tokens | TTL | Savings | |----------|-----------|------------|-----|---------| | Anthropic | Explicit (opt-in) | 1,024-2,048 | 5 min (refreshing) | Up to 90% | | OpenAI | Automatic | 1,024 | ~5-10 min | 50% on cached | | Google | Explicit | 32,768 | Custom | Varies by model |
### Key Concepts
- Prefix matching: Cache only works for exact prefix matches
- Cache writes: First-time caching has a small overhead cost
- Cache reads: Subsequent hits are significantly cheaper
- Ordering matters: Rearranging content breaks the cache