Overview
Direct Answer
A fully connected layer is a neural network component in which each neuron receives input from all neurons in the preceding layer and transmits output to all neurons in the following layer. Also termed a dense layer, it forms a complete bipartite graph of connections between adjacent layers.
How It Works
Each neuron in the layer computes a weighted sum of all inputs from the prior layer, adds a bias term, and applies an activation function to produce its output. The weight matrix dimensionality is determined by the product of the input and output neuron counts, making computation cost scale quadratically with layer size. This architecture enables the network to learn arbitrary non-linear transformations by adjusting weights during backpropagation.
Why It Matters
Dense layers serve as the primary mechanism for learning complex feature representations and decision boundaries in neural networks. They are computationally efficient for feature extraction and classification tasks, directly impacting model accuracy and inference latency—critical factors in production systems handling real-time predictions and large-scale data processing.
Common Applications
Fully connected layers appear in image classification networks (following convolutional feature extraction), natural language processing models for text classification, recommendation systems, and time-series forecasting. They form the output layer in virtually all supervised learning neural networks.
Key Considerations
Fully connected layers introduce significant parameter overhead compared to convolutional or recurrent alternatives, increasing memory consumption and training time. They assume no spatial or temporal structure in data, making them less efficient than specialised layers for structured inputs such as images or sequences.
Cross-References(1)
More in Deep Learning
Capsule Network
ArchitecturesA neural network architecture that groups neurons into capsules to better capture spatial hierarchies and part-whole relationships.
Fine-Tuning
Language ModelsThe process of adapting a pre-trained model to a specific task by continuing training on a smaller task-specific dataset, transferring learned representations to new domains.
Mamba Architecture
ArchitecturesA selective state space model that achieves transformer-level performance with linear-time complexity by incorporating input-dependent selection mechanisms into the recurrence.
ReLU
Training & OptimisationRectified Linear Unit — an activation function that outputs the input directly if positive, otherwise outputs zero.
Dropout
Training & OptimisationA regularisation technique that randomly deactivates neurons during training to prevent co-adaptation and reduce overfitting.
Residual Connection
Training & OptimisationA skip connection that adds a layer's input directly to its output, enabling gradient flow through deep networks and allowing training of architectures with hundreds of layers.
Gradient Clipping
Training & OptimisationA technique that caps gradient values during training to prevent the exploding gradient problem.
Self-Attention
Training & OptimisationAn attention mechanism where each element in a sequence attends to all other elements to compute its representation.