Deep LearningTraining & Optimisation

Attention Head

Overview

Direct Answer

An attention head is an individual computational unit within a multi-head attention mechanism that applies learned queries, keys, and values to compute weighted relevance scores across input sequences. Each head independently learns to attend to different positional and semantic relationships, with outputs concatenated to form richer contextual representations.

How It Works

Each attention head performs scaled dot-product attention by computing compatibility scores between query vectors and key vectors, normalising these scores via softmax, and using them to weight value vectors. Multiple heads operate in parallel with separate learned parameter matrices, allowing the model to simultaneously capture syntactic patterns, long-range dependencies, and semantic features from different representation subspaces.

Why It Matters

Multiple independent heads improve model capacity and interpretability whilst maintaining computational efficiency through parallelisation. This architecture has become foundational for state-of-the-art performance in language understanding, translation, and sequence modelling tasks, directly impacting accuracy and convergence speed in production NLP systems.

Common Applications

Attention heads are integral to transformer models used in machine translation systems, large language models for text generation, and multimodal systems combining vision and language. They enable models like BERT and GPT to achieve superior performance on classification, question-answering, and summarisation tasks across enterprise applications.

Key Considerations

Practitioners must balance the number of heads against computational cost and memory requirements; too few heads may limit representational capacity whilst excessive heads introduce redundancy without proportional performance gains. Attention patterns across heads often show correlation, suggesting some redundancy is inherent to the design.

Cross-References(1)

More in Deep Learning