AI systems face a fundamentally different threat model than traditional software. Instead of exploiting code vulnerabilities, attackers manipulate the AI's reasoning through carefully crafted inputs.
Why AI Security Is Different
Traditional software security relies on clear boundaries between code and data. LLMs blur this boundary — user input (data) directly influences behavior (code). This creates a new attack surface with no direct equivalent in classical security.
OWASP Top 10 for LLM Applications
The OWASP Foundation identified the top security risks for LLM applications:
- Prompt Injection: Manipulating model behavior through crafted inputs
- Insecure Output Handling: Trusting LLM outputs without validation
- Training Data Poisoning: Corrupting training data to influence model behavior
- Model Denial of Service: Crafting inputs that consume excessive resources
- Supply Chain Vulnerabilities: Compromised models, plugins, or dependencies
- Sensitive Information Disclosure: Model revealing training data or system details
- Insecure Plugin Design: Plugins with excessive permissions or poor input validation
- Excessive Agency: Giving AI too much autonomous action capability
- Overreliance: Trusting AI outputs without human verification
- Model Theft: Extracting model weights or capabilities through API probing
Attack Motivations
| Attacker Type | Goal | Methods | |--------------|------|---------| | Curious users | Bypass restrictions, have fun | Jailbreaks, role-playing tricks | | Competitors | Extract training data, system prompts | Prompt injection, model probing | | Malicious actors | Generate harmful content at scale | Automated jailbreaking, API abuse | | Researchers | Find vulnerabilities, publish papers | Systematic adversarial testing | | Insiders | Exfiltrate data via AI systems | Indirect prompt injection |
The Fundamental Challenge
There is currently no provably secure defense against prompt injection. Every defense is a heuristic that can potentially be bypassed with sufficient effort. Security is therefore about raising the cost of attack and detecting/responding to breaches, not eliminating all risk.
This mirrors traditional security: no system is unhackable, but well-defended systems aren't worth the effort for most attackers.