Systematic approaches to identifying and fixing prompt-related issues in AI applications.
## Common Prompt Failures
### Failure Pattern Analysis ``` HALLUCINATION: - Model invents facts not in context - Fix: Add "Only use information provided" + citation requirements
INSTRUCTION DRIFT: - Model ignores parts of complex prompts - Fix: Use structured sections, numbered steps, XML tags
FORMAT VIOLATIONS: - Output doesn't match expected schema - Fix: Provide examples, use JSON mode, add validation
CONTEXT OVERFLOW: - Important info pushed out of context window - Fix: Summarization, RAG chunking, priority ordering ```
## Debugging Workflow
### 1. Isolate the Failure ```python # Log all inputs and outputs import logging
def debug_prompt(prompt, response): logging.info(f""" === PROMPT DEBUG === Input tokens: {count_tokens(prompt)} System prompt: {system_prompt[:200]}... User prompt: {prompt[:500]}... === RESPONSE === Output tokens: {count_tokens(response)} Response: {response[:1000]}... === ANALYSIS === Contains expected format: {check_format(response)} Factual accuracy: {verify_facts(response)} """) ```
### 2. A/B Test Prompt Variations ```python prompt_variants = [ {"name": "baseline", "prompt": original_prompt}, {"name": "structured", "prompt": add_xml_tags(original_prompt)}, {"name": "few_shot", "prompt": add_examples(original_prompt)}, {"name": "chain_of_thought", "prompt": add_cot(original_prompt)}, ]
results = [] for variant in prompt_variants: response = call_model(variant["prompt"]) score = evaluate_response(response) results.append({"variant": variant["name"], "score": score}) ```
## Tools for Prompt Debugging - LangSmith: Trace visualization, prompt versioning - Weights & Biases Prompts: A/B testing, evaluation - Braintrust: Production prompt monitoring - PromptLayer: Request logging and analytics