BERT — Technology Wiki

Overview

Direct Answer

BERT (Bidirectional Encoder Representations from Transformers) is a transformer-based language model trained on masked language modelling that processes text bidirectionally to generate contextualised word embeddings. Released by Google in 2018, it represents a fundamental shift from unidirectional language models by simultaneously considering both preceding and following tokens when encoding meaning.

How It Works

BERT employs a multi-layer transformer encoder architecture that masks 15% of input tokens during training and learns to predict them using surrounding context. During inference, it produces contextualised embeddings where each token's representation depends on its full sentence context, not just sequential history. The model pretrains on two objectives: masked language modelling and next-sentence prediction, enabling it to capture deep syntactic and semantic relationships.

Why It Matters

The model achieved state-of-the-art results across multiple NLP benchmarks upon release, significantly improving accuracy for tasks like sentiment analysis, named entity recognition, and question answering. Organisations leverage it to reduce development time for language understanding systems and improve performance on domain-specific tasks through fine-tuning rather than training from scratch.

Common Applications

Applications include document classification, semantic similarity measurement, and information extraction in legal and financial document processing. Search engines and chatbot platforms utilise bidirectional representations to improve query understanding and relevance ranking.

Key Considerations

Computational cost during pretraining is substantial, though fine-tuning on task-specific data remains efficient. Bidirectional processing makes the model unsuitable for autoregressive generation tasks; practitioners must select architectures appropriate to their specific use case, whether comprehension or generation-focused.

Cross-References(1)

Natural Language Processing

Language Model

Related in Semantics & Representation

Large Language Model

A neural network trained on massive text corpora that can generate, understand, and reason about natural language.

GPT

Generative Pre-trained Transformer — a family of autoregressive language models that generate text by predicting the next token.

Tokenisation

The process of breaking text into smaller units (tokens) such as words, subwords, or characters for processing by language models.

Language Model

A probabilistic model that assigns probabilities to sequences of words, enabling prediction of the next word in a sequence.

Contextual Embedding

Word representations that change based on surrounding context, capturing polysemy and contextual meaning.

Word2Vec

A neural network model that learns distributed word representations by predicting surrounding context words.

GloVe

Global Vectors for Word Representation — an unsupervised learning algorithm for obtaining word vector representations from aggregated word co-occurrence statistics.

Instruction Tuning

Training a language model to follow natural language instructions by fine-tuning on instruction-response pairs.

RLHF

Reinforcement Learning from Human Feedback — a technique for aligning language models with human preferences through reward modelling.

Grounding

Connecting language model outputs to real-world knowledge, facts, or data sources to improve factual accuracy.

Hallucination Detection

Techniques for identifying when AI language models generate plausible but factually incorrect or unsupported content.

Prompt Injection

A security vulnerability where malicious inputs manipulate a language model into ignoring its instructions or producing unintended outputs.

More in Natural Language Processing

Text-to-Speech

Speech & Audio

Technology that converts written text into natural-sounding spoken audio using neural networks, enabling voice interfaces, accessibility tools, and content narration.

Coreference Resolution

Parsing & Structure

The task of identifying all expressions in text that refer to the same real-world entity.

Information Extraction

Parsing & Structure

The process of automatically extracting structured information from unstructured or semi-structured text sources.

Intent Detection

Generation & Translation

The classification of user utterances into predefined categories representing the user's goal or purpose, a fundamental component of conversational AI and chatbot systems.

Dialogue System

Generation & Translation

A computer system designed to converse with humans, encompassing task-oriented and open-domain conversation.

Text Embedding

Core NLP

Dense vector representations of text passages that capture semantic meaning for similarity comparison and retrieval.

Natural Language Processing

Core NLP

The field of AI focused on enabling computers to understand, interpret, and generate human language.

Speech-to-Text

Speech & Audio

The automatic transcription of spoken language into written text using acoustic and language models, foundational to voice assistants and meeting transcription systems.