Training

LoRA / QLoRA

Parameter-efficient fine-tuning methods that only update a small fraction of weights.

LoRA (Low-Rank Adaptation) trains small adapter matrices that are added to a frozen base model, instead of updating all the model's weights. Result: vastly cheaper training, smaller artifacts to deploy (megabytes instead of gigabytes), and the ability to swap adapters in/out for different tasks on the same base model. QLoRA is LoRA on a quantized (4-bit) base model, which makes fine-tuning large open-weight models possible on a single GPU.

Related terms

Fine-tuning

Building with LoRA / QLoRA?

We ship production AI systems built around concepts like this every quarter. Send a brief and get a written proposal in 48 hours.

Send a brief →