Fine-tuning is the process of taking a pre-trained model and continuing its training on a smaller, domain-specific dataset. It's the bridge between a general-purpose model and one that excels at your exact task.
When to Fine-Tune vs. Prompt Engineer
Prompt engineering is your first tool — it's fast, cheap, and requires no infrastructure. But it has limits:
- Consistent formatting: If you need every response in a precise JSON schema or specific tone, fine-tuning bakes this into the model's weights rather than relying on instructions.
- Domain knowledge: Medical, legal, or highly technical domains where the base model lacks depth benefit enormously from fine-tuning on curated examples.
- Latency & cost: Fine-tuned models can produce correct outputs with shorter prompts, reducing token costs and response time.
- Proprietary style: If you need outputs that match a specific brand voice across thousands of interactions, fine-tuning learns patterns that prompts can only approximate.
The Fine-Tuning Spectrum
Not all fine-tuning is equal. The field has evolved from full fine-tuning (updating every parameter) to increasingly efficient methods:
| Method | Parameters Updated | GPU Memory | Training Time | |--------|-------------------|------------|---------------| | Full fine-tuning | 100% | Very high | Hours-days | | LoRA | 0.1-1% | Moderate | Minutes-hours | | QLoRA | 0.1-1% (quantized) | Low | Minutes-hours | | Prompt tuning | <0.01% | Very low | Minutes |
Key Terminology
- Base model: The pre-trained model you start from (e.g., Llama 3, Mistral, Gemma)
- Adapter: Small trainable matrices added to the frozen base model
- Epoch: One complete pass through your training dataset
- Loss: A number measuring how wrong the model's predictions are — lower is better
- Overfitting: When the model memorizes training data instead of learning patterns