Understanding what drives AI costs is essential for building a cost optimization strategy.
## Cost Components
### LLM API Costs - Input tokens: Cost per 1K tokens of prompt/context - Output tokens: Usually 2-4x more expensive than input - Requests: Some providers charge per API call - Compute type: GPU type and usage duration
### Infrastructure Costs | Resource | Cost Driver | Optimization Lever | |----------|------------|---------------------| | GPU compute | Utilization %, hours | Right-sizing, spot instances | | GPU memory | Model size, batch size | Quantization, model sharing | | Network | Data transfer volume | Caching, regional deployment | | Storage | Dataset and model size | Compression, lifecycle policies | | Monitoring | Log volume, metrics | Sampling, retention limits |
### Hidden Costs - Inference over-provisioning (idle GPUs) - Failed requests and retries - Development and testing inference - Data preprocessing compute
## Cost Benchmarks
### LLM API Pricing (approximate) | Model | Input $/M tokens | Output $/M tokens | |-------|-----------------|-------------------| | GPT-4o mini | $0.15 | $0.60 | | Claude 3 Haiku | $0.25 | $1.25 | | GPT-4o | $2.50 | $10.00 | | Claude 3.5 Sonnet | $3.00 | $15.00 | | GPT-4 Turbo | $10.00 | $30.00 |
### ROI Framework ``` Cost Per Task = (Input Tokens × Input Price + Output Tokens × Output Price) / Tasks Target: <$0.01 per task for high-volume, <$0.10 for complex tasks
Example: 100K customer support queries/month - GPT-4o: 100K × 500 tokens avg = 50M tokens = $125-250/month - Claude Haiku: 100K × 500 tokens = 50M tokens = $12.50-62.50/month - Open source (self-hosted): $2,000-5,000/month infrastructure BUT: unlimited requests, no per-token cost ```