Machine LearningUnsupervised Learning

t-SNE

Overview

Direct Answer

t-SNE (t-Distributed Stochastic Neighbour Embedding) is a non-linear dimensionality reduction algorithm that maps high-dimensional data into two or three-dimensional space while preserving local neighbourhood structure. Unlike linear techniques such as PCA, it excels at revealing cluster separation and hidden patterns in complex datasets.

How It Works

The algorithm converts high-dimensional Euclidean distances into conditional probabilities representing neighbourhood relationships, then iteratively minimises the divergence between these probabilities in the original and low-dimensional spaces using gradient descent. It employs a Student's t-distribution in the low-dimensional space, which provides heavier tails than Gaussian distributions and allows dissimilar points to repel effectively, producing clearer cluster visualisations.

Why It Matters

Teams rely on t-SNE for exploratory data analysis when assessing dataset quality, validating clustering outcomes, and identifying outliers before model deployment. The technique accelerates decision-making in data science workflows by enabling rapid visual inspection of unlabelled data, reducing the cost of manual annotation and improving confidence in downstream model selection.

Common Applications

Practitioners use the method to visualise gene expression profiles in genomics research, explore image embeddings in computer vision pipelines, and inspect document similarity in natural language processing. It is standard in single-cell RNA sequencing analysis and helps data scientists validate the separability of classes in classification tasks.

Key Considerations

The algorithm is computationally expensive for large datasets and sensitive to hyperparameters such as perplexity; results may vary significantly across runs due to stochastic initialisation. It preserves local structure but distorts global distances, making it unsuitable for quantitative analysis or downstream model input.

Cross-References(1)

Deep Learning

More in Machine Learning

See Also