Coreference Resolution — Technology Wiki

Overview

Direct Answer

Coreference resolution is the computational task of identifying and linking all linguistic expressions (pronouns, noun phrases, named entities) within a text that reference the same underlying entity or concept. This process enables systems to understand that "she", "the CEO", and "Jane Smith" may all refer to a single individual.

How It Works

Systems analyse syntactic structure, semantic similarity, and discourse context to determine whether two mentions should be linked. Modern approaches employ neural networks that encode mention representations and compute similarity scores, using features such as grammatical agreement, contextual embeddings, and entity attributes to decide whether mentions corefer.

Why It Matters

Accurate linking of expressions improves downstream NLP tasks including question-answering, information extraction, and knowledge graph construction. For customer support automation and legal document analysis, resolving references reduces ambiguity and ensures critical information is correctly attributed, directly impacting compliance and decision-making accuracy.

Common Applications

Applications include automated summarisation (tracking subjects across sentences), biomedical text mining (linking drug and disease mentions), customer service chatbots (maintaining dialogue context), and financial intelligence systems (connecting references to companies and executives across reports and filings).

Key Considerations

The task becomes significantly harder with ambiguous pronouns, long-range dependencies, and texts involving multiple entities of the same type. Domain-specific entity vocabularies and genre variations (formal vs. conversational language) require careful model adaptation and evaluation.

Related in Parsing & Structure

Byte-Pair Encoding

A subword tokenisation algorithm that iteratively merges the most frequent character pairs to build a vocabulary.

Named Entity Recognition

An NLP task that identifies and classifies named entities in text into categories like person, organisation, and location.

Dependency Parsing

The syntactic analysis of a sentence to establish relationships between head words and words that modify them.

Part-of-Speech Tagging

The process of assigning grammatical categories (noun, verb, adjective) to each word in a text.

Information Extraction

The process of automatically extracting structured information from unstructured or semi-structured text sources.

Relation Extraction

Identifying semantic relationships between entities mentioned in text.

More in Natural Language Processing

Prompt Injection

Semantics & Representation

A security vulnerability where malicious inputs manipulate a language model into ignoring its instructions or producing unintended outputs.

Question Answering

Generation & Translation

An NLP task where a system automatically answers questions posed in natural language based on given context.

Speech-to-Text

Speech & Audio

The automatic transcription of spoken language into written text using acoustic and language models, foundational to voice assistants and meeting transcription systems.

Text Summarisation

Text Analysis

The process of creating a concise and coherent summary of a longer text document while preserving key information.

Context Window

Semantics & Representation

The maximum amount of text a language model can consider at once when generating a response.

Abstractive Summarisation

Text Analysis

A text summarisation approach that generates novel sentences to capture the essential meaning of a document, rather than simply extracting and rearranging existing sentences.

Reranking

Core NLP

A two-stage retrieval process where an initial set of candidate documents is rescored by a more powerful model to improve the relevance ordering of search results.

Large Language Model

Semantics & Representation

A neural network trained on massive text corpora that can generate, understand, and reason about natural language.