Overview
Direct Answer
Extractive summarisation is a Natural Language Processing technique that automatically condenses documents by selecting and retaining the most salient sentences from the original source, preserving their exact wording without paraphrase or generation of new content.
How It Works
The approach ranks sentences using statistical or machine learning methods—such as term frequency-inverse document frequency (TF-IDF), graph-based algorithms, or neural scoring models—to identify those carrying the greatest semantic importance. Selected sentences are then assembled in their original sequence to form a shorter document, maintaining coherence through preservation of the source text's structure and language.
Why It Matters
Organisations benefit from rapid document processing at scale, particularly where speed and interpretability are critical; since no novel text is generated, output remains fully traceable to source material, supporting compliance, auditability, and stakeholder trust. This approach reduces computational overhead compared to abstractive methods, making it cost-effective for high-volume document workflows.
Common Applications
Applications include legal document review, where key clauses and obligations must be flagged; news aggregation platforms requiring fast headline extraction; customer support ticket prioritisation; and scientific literature filtering in research institutions seeking rapid assessment of publication relevance.
Key Considerations
The technique cannot bridge gaps in source content or reshape information for clarity, limiting its effectiveness where documents are poorly structured or where context requires paraphrasing. Quality depends heavily on sentence-ranking algorithm selection and may miss nuanced information valuable to specific user contexts.
More in Natural Language Processing
Token Limit
Semantics & RepresentationThe maximum number of tokens a language model can process in a single input-output interaction.
RLHF
Semantics & RepresentationReinforcement Learning from Human Feedback — a technique for aligning language models with human preferences through reward modelling.
Speech Synthesis
Speech & AudioThe artificial production of human speech from text, also known as text-to-speech.
Abstractive Summarisation
Text AnalysisA text summarisation approach that generates novel sentences to capture the essential meaning of a document, rather than simply extracting and rearranging existing sentences.
BERT
Semantics & RepresentationBidirectional Encoder Representations from Transformers — a language model that understands context by reading text in both directions.
Aspect-Based Sentiment Analysis
Text AnalysisA fine-grained sentiment analysis approach that identifies opinions directed at specific aspects or features of an entity, such as a product's price, quality, or design.
Natural Language Processing
Core NLPThe field of AI focused on enabling computers to understand, interpret, and generate human language.
Coreference Resolution
Parsing & StructureThe task of identifying all expressions in text that refer to the same real-world entity.