## Why Serverless for AI?
Serverless computing scales automatically, charges per-use, and eliminates infrastructure management — ideal for AI applications with variable traffic.
### Serverless Options for AI
| Platform | Best For | Cold Start | Max Runtime | |----------|----------|-----------|-------------| | AWS Lambda | API proxying, light inference | 1-10s | 15 min | | Supabase Edge Functions | API proxying, streaming | ~50ms | 60s | | Vercel Functions | Next.js AI apps | ~100ms | 5-60s | | Cloudflare Workers | Ultra-low latency | ~0ms | 30s | | Modal | GPU inference | 1-30s | Unlimited |
### Architecture Patterns
API Proxy Pattern: ``` Client → Edge Function → AI Provider API → Client ``` Best for: Hiding API keys, adding auth, rate limiting
Queue-Based Pattern: ``` Client → API → Queue → Worker → Storage → Client (poll) ``` Best for: Long-running AI tasks (image generation, batch processing)
Streaming Pattern: ``` Client → Edge Function → AI API (streaming) → Client (SSE) ``` Best for: Chat applications, real-time responses
### Edge Function Example
```typescript // Supabase Edge Function Deno.serve(async (req) => { const { prompt } = await req.json(); const response = await fetch("https://api.openai.com/v1/chat/completions", { method: "POST", headers: { "Authorization": `Bearer ${Deno.env.get("OPENAI_API_KEY")}`, "Content-Type": "application/json", }, body: JSON.stringify({ model: "gpt-4o-mini", messages: [{ role: "user", content: prompt }], }), }); return new Response(response.body, { headers: { "Content-Type": "application/json" } }); }); ```