Deep LearningLanguage Models

LoRA

Overview

Direct Answer

LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that trains small, low-rank matrices inserted alongside frozen pretrained model weights, rather than updating all parameters. This approach dramatically reduces the number of trainable parameters whilst maintaining model performance.

How It Works

LoRA decomposes weight updates into products of two smaller matrices with reduced rank dimensions. During fine-tuning, only these low-rank decomposition matrices are optimised whilst the original pretrained weights remain fixed. The adapted weights are computed as the sum of the frozen weights and the product of the trained low-rank matrices, scaled by a learning rate factor.

Why It Matters

This technique reduces memory consumption and computational cost by 10-100 times compared to full fine-tuning, making large model adaptation feasible on modest hardware. Organisations can now customise foundation models for specific tasks without prohibitive infrastructure investment or extended training timelines.

Common Applications

LoRA is widely deployed in adapting large language models for domain-specific tasks, personalising image generation models for particular artistic styles, and enabling efficient task-specific variants of models like LLaMA and Stable Diffusion. Financial institutions and healthcare organisations use it to specialise models whilst maintaining compliance boundaries.

Key Considerations

The technique introduces a rank hyperparameter that requires tuning; ranks that are too low may limit adaptation capability, whilst higher ranks diminish parameter efficiency gains. LoRA performs best when fine-tuning data resembles pretraining distribution; extreme distribution shifts may still require partial weight unfreezing.

Cross-References(2)

More in Deep Learning