Skip to Content

LoRA and QLoRA: Efficient Fine-Tuning

LoRA and QLoRA: Efficient Fine-Tuning

The Problem

Full fine-tuning of a 7B parameter model requires ~28GB of GPU memory just for the model weights, plus optimizer states. A 70B model? Forget about it on consumer hardware.

LoRA (Low-Rank Adaptation)

Instead of updating all model weights, LoRA freezes the original weights and adds small trainable "adapter" matrices:

  • Original weight matrix: W (d Ɨ d, frozen)
  • LoRA adapters: A (d Ɨ r) and B (r Ɨ d), where r << d
  • Effective weight: W + AƗB
  • Typical rank r = 8-64, reducing trainable parameters by 100-1000x

QLoRA (Quantized LoRA)

Combines LoRA with 4-bit quantization:

  • Base model loaded in 4-bit precision (NF4 quantization)
  • LoRA adapters trained in 16-bit precision
  • Fine-tune a 65B model on a single 48GB GPU!
  • Virtually no quality loss compared to full fine-tuning

Practical Example with Hugging Face

from peft import LoraConfig, get_peft_model

lora_config = LoraConfig(
    r=16,                    # Rank
    lora_alpha=32,           # Scaling factor
    target_modules=["q_proj", "v_proj"],  # Which layers to adapt
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

model = get_peft_model(base_model, lora_config)
model.print_trainable_parameters()
# → trainable params: 4,194,304 || all params: 6,742,609,920 || trainable%: 0.062

Key Hyperparameters

  • Rank (r): Higher = more capacity but more parameters. Start with 16.
  • Alpha: Scaling factor. Common rule: alpha = 2 Ɨ rank
  • Target modules: Usually attention layers (q_proj, v_proj, k_proj, o_proj)
  • Learning rate: Higher than full fine-tuning (1e-4 to 3e-4)

🌼 Daisy+ in Action: Future Fine-Tuning Ready

While Daisy+ currently relies on prompt engineering rather than fine-tuning, the architecture supports future fine-tuning workflows — conversation logs from digital employees could be used to train specialized LoRA adapters for industry-specific language patterns. Imagine a LoRA trained on thousands of successful customer support conversations, making responses faster and more domain-accurate.

Rating
0 0

There are no comments for now.

to be the first to leave a comment.