Deep LearningArchitectures

Tensor Parallelism

Overview

Direct Answer

Tensor parallelism is a distributed training strategy that partitions individual weight matrices and activation tensors across multiple devices along specific dimensions, enabling computation of a single model layer to occur in parallel. Unlike data parallelism, which replicates the entire model, this approach reduces memory footprint per device by distributing the mathematical operations of matrix multiplications themselves.

How It Works

During forward and backward propagation, weight matrices are split column-wise or row-wise across devices. Each device computes a partial result on its assigned tensor slice, then results are aggregated through collective operations (e.g. all-reduce). Communication overlaps with computation where feasible, minimising synchronisation overhead. The granularity and axis of partitioning depend on the layer type and target batch size.

Why It Matters

This approach enables training of exceptionally large models that would exceed single-device memory constraints, directly impacting capability and cost-efficiency in large language model and vision transformer development. Organisations prioritise it when model scale exceeds practical limits of other parallelism strategies, particularly when batch sizes cannot be increased freely.

Common Applications

Tensor parallelism is widely deployed in training large transformer-based language models and multimodal systems where model dimension is the primary scaling factor. It is frequently combined with pipeline and data parallelism in systems handling billions of parameters.

Key Considerations

Communication bandwidth between devices becomes a critical bottleneck; synchronous all-reduce operations can introduce substantial latency on slower interconnects. The strategy is most effective on high-bandwidth clusters and less suitable for models with small embedding or hidden dimensions relative to device count.

Cross-References(1)

Business & Strategy

More in Deep Learning

See Also