LoRA and QLoRA: Efficient Fine-Tuning
LoRA and QLoRA: Efficient Fine-Tuning
The Problem
Full fine-tuning of a 7B parameter model requires ~28GB of GPU memory just for the model weights, plus optimizer states. A 70B model? Forget about it on consumer hardware.
LoRA (Low-Rank Adaptation)
Instead of updating all model weights, LoRA freezes the original weights and adds small trainable "adapter" matrices:
- Original weight matrix: W (d Ć d, frozen)
- LoRA adapters: A (d Ć r) and B (r Ć d), where r << d
- Effective weight: W + AĆB
- Typical rank r = 8-64, reducing trainable parameters by 100-1000x
QLoRA (Quantized LoRA)
Combines LoRA with 4-bit quantization:
- Base model loaded in 4-bit precision (NF4 quantization)
- LoRA adapters trained in 16-bit precision
- Fine-tune a 65B model on a single 48GB GPU!
- Virtually no quality loss compared to full fine-tuning
Practical Example with Hugging Face
from peft import LoraConfig, get_peft_model
lora_config = LoraConfig(
r=16, # Rank
lora_alpha=32, # Scaling factor
target_modules=["q_proj", "v_proj"], # Which layers to adapt
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM"
)
model = get_peft_model(base_model, lora_config)
model.print_trainable_parameters()
# ā trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.062
Key Hyperparameters
- Rank (r): Higher = more capacity but more parameters. Start with 16.
- Alpha: Scaling factor. Common rule: alpha = 2 Ć rank
- Target modules: Usually attention layers (q_proj, v_proj, k_proj, o_proj)
- Learning rate: Higher than full fine-tuning (1e-4 to 3e-4)
š¼ Daisy+ in Action: Future Fine-Tuning Ready
While Daisy+ currently relies on prompt engineering rather than fine-tuning, the architecture supports future fine-tuning workflows ā conversation logs from digital employees could be used to train specialized LoRA adapters for industry-specific language patterns. Imagine a LoRA trained on thousands of successful customer support conversations, making responses faster and more domain-accurate.
There are no comments for now.