t-SNE

Overview

Direct Answer

t-SNE (t-Distributed Stochastic Neighbour Embedding) is a non-linear dimensionality reduction algorithm that maps high-dimensional data into two or three-dimensional space while preserving local neighbourhood structure. Unlike linear techniques such as PCA, it excels at revealing cluster separation and hidden patterns in complex datasets.

How It Works

The algorithm converts high-dimensional Euclidean distances into conditional probabilities representing neighbourhood relationships, then iteratively minimises the divergence between these probabilities in the original and low-dimensional spaces using gradient descent. It employs a Student's t-distribution in the low-dimensional space, which provides heavier tails than Gaussian distributions and allows dissimilar points to repel effectively, producing clearer cluster visualisations.

Why It Matters

Teams rely on t-SNE for exploratory data analysis when assessing dataset quality, validating clustering outcomes, and identifying outliers before model deployment. The technique accelerates decision-making in data science workflows by enabling rapid visual inspection of unlabelled data, reducing the cost of manual annotation and improving confidence in downstream model selection.

Common Applications

Practitioners use the method to visualise gene expression profiles in genomics research, explore image embeddings in computer vision pipelines, and inspect document similarity in natural language processing. It is standard in single-cell RNA sequencing analysis and helps data scientists validate the separability of classes in classification tasks.

Key Considerations

The algorithm is computationally expensive for large datasets and sensitive to hyperparameters such as perplexity; results may vary significantly across runs due to stochastic initialisation. It preserves local structure but distorts global distances, making it unsuitable for quantitative analysis or downstream model input.

Cross-References(1)

Deep Learning

Embedding

Related in Unsupervised Learning

Dimensionality Reduction

Techniques that reduce the number of input variables in a dataset while preserving essential information and structure.

Principal Component Analysis

A dimensionality reduction technique that transforms data into orthogonal components ordered by the amount of variance they explain.

UMAP

Uniform Manifold Approximation and Projection — a dimensionality reduction technique for visualisation and general non-linear reduction.

Clustering

Unsupervised learning technique that groups similar data points together based on inherent patterns without predefined labels.

K-Means Clustering

A partitioning algorithm that divides data into k clusters by minimising the distance between points and their cluster centroids.

DBSCAN

Density-Based Spatial Clustering of Applications with Noise — a clustering algorithm that finds arbitrarily shaped clusters based on density.

Hierarchical Clustering

A clustering method that builds a tree-like hierarchy of clusters through successive merging or splitting of groups.

Association Rule Learning

A method for discovering interesting relationships and patterns between variables in large datasets.

Collaborative Filtering

A recommendation technique that makes predictions based on the collective preferences and behaviour of many users.

Content-Based Filtering

A recommendation approach that suggests items similar to those a user has previously liked, based on item attributes.

Matrix Factorisation

A technique that decomposes a matrix into constituent matrices, widely used in recommendation systems and dimensionality reduction.

More in Machine Learning

Machine Learning

MLOps & Production

A subset of AI that enables systems to automatically learn and improve from experience without being explicitly programmed.

Supervised Learning

MLOps & Production

A machine learning paradigm where models are trained on labelled data, learning to map inputs to known outputs.

Overfitting

Training Techniques

When a model learns the training data too well, including noise, resulting in poor performance on unseen data.

Feature Selection

MLOps & Production

The process of identifying and selecting the most relevant input variables for a machine learning model.

Backpropagation

Training Techniques

The algorithm for computing gradients of the loss function with respect to network weights, enabling neural network training.

Curriculum Learning

Advanced Methods

A training strategy that presents examples to a model in a meaningful order, typically from easy to hard.

Gradient Boosting

Supervised Learning

An ensemble technique that builds models sequentially, with each new model correcting residual errors of the combined ensemble.

Transfer Learning

Advanced Methods

A technique where knowledge gained from training on one task is applied to a different but related task.

Overview

Direct Answer

How It Works

Why It Matters

Common Applications

Key Considerations

Cross-References(1)

Related in Unsupervised Learning

Dimensionality Reduction

Principal Component Analysis

UMAP

Clustering

K-Means Clustering

DBSCAN

Hierarchical Clustering

Association Rule Learning

Collaborative Filtering

Content-Based Filtering

Matrix Factorisation

More in Machine Learning

Machine Learning

Supervised Learning

Overfitting

Feature Selection

Backpropagation

Curriculum Learning

Gradient Boosting

Transfer Learning

See Also

Embedding