Data Lake — Technology Wiki

Overview

Direct Answer

A data lake is a centralised repository that ingests and stores raw, unstructured, and structured data in its native format without predefined schemas or transformation. Unlike data warehouses, data lakes defer the structuring and analytical purpose of data until the point of consumption.

How It Works

Data lakes employ a schema-on-read architecture where data is catalogued with metadata but remains untransformed during ingestion. Storage systems typically distribute data across commodity hardware using distributed file systems or object storage, enabling horizontal scalability. Query engines and analytical tools apply structure and transformation only when data is accessed for specific analysis.

Why It Matters

Organisations benefit from reduced preprocessing costs and greater flexibility to repurpose raw data for unforeseen analytical needs. The approach accelerates time-to-insight by eliminating upfront schema definition and supports exploration of diverse data sources—logs, sensors, transactions, and unstructured text—within a single system. This agility is critical for machine learning and exploratory data science initiatives.

Common Applications

Financial institutions use data lakes to consolidate transaction records, market data, and customer behaviour for fraud detection and risk modelling. Healthcare organisations integrate patient records, diagnostic imaging, and genomic data for cohort analysis. Retail and manufacturing sectors leverage sensor and operational data for real-time performance monitoring and predictive maintenance.

Key Considerations

Data lakes can become unmaintained repositories ('data swamps') without disciplined governance, metadata management, and access controls. Organisations must implement cataloguing, retention policies, and quality assurance to realise value and maintain regulatory compliance.

Cited Across coldai.org5 pages mention Data Lake

Industry pages, services, technologies, capabilities, case studies and insights on coldai.org that reference Data Lake — providing applied context for how the concept is used in client engagements.

Case Study

Modern Data Platforms: From Data Lakes to Intelligence Infrastructure

How the data platform landscape is evolving from centralized data lakes to distributed, AI-ready intelligence infrastructure — and what it means for enterprise architecture.

Insight

Chemicals Process Engineers Now Report to the Chief Data Officer — and what comes next

The organizational shift embedding AI agents into reaction pathways is cutting R&D cycle time by 40% and rewriting who controls capex allocation.

Insight

How Hospital Systems Are Replacing EHR Vendors With Federated AI Layers

The fastest-growing IT budget line in healthcare isn't software licenses—it's the middleware that lets clinical AI agents read, write, and route decisions across fragmented data es

Insight

Inside: Drug Developers Are Abandoning Centralized Data Lakes for Federated Ledgers

Pharmaceutical companies now lose less IP to distributed compute than to cloud breaches, reversing two decades of centralization economics.

Insight

Real Estate Valuation Models Break When Built on Third-Party Data Pipelines. Here’s what changed

Institutional investors deploying AI are discovering that data ownership, not algorithm sophistication, determines alpha generation in property markets.

Related in Core ERP

Enterprise Resource Planning

Integrated management software that connects core business processes including finance, HR, manufacturing, supply chain, and procurement.

Oracle ERP Cloud

Oracle's cloud-based enterprise resource planning suite covering financials, procurement, project management, and risk.

Microsoft Dynamics 365

Microsoft's suite of enterprise resource planning and customer relationship management cloud applications.

Supply Chain Management

The coordination and management of all activities involved in sourcing, procurement, conversion, and logistics.

Task Mining

Technology that observes and analyses how employees interact with desktop applications to identify automation opportunities.

Digital Twin

A virtual replica of a physical system, process, or product that simulates its real-world counterpart for analysis and optimisation.

API Gateway

A server that acts as a single entry point for API calls, handling request routing, composition, and protocol translation.

Enterprise Architecture

A strategic framework for aligning an organisation's IT infrastructure and processes with its business objectives.

TOGAF

The Open Group Architecture Framework — a comprehensive framework for enterprise architecture development and governance.

Service-Oriented Architecture

An architectural pattern where services are provided to components via a network communication protocol.

Data Mesh

A decentralised data architecture approach where domain teams own and manage their own data products.

Data Fabric

An architecture that provides a unified, intelligent layer for integrating data management across cloud and on-premises environments.

More in Enterprise Systems & ERP

Disaster Recovery

Core ERP

The policies, tools, and procedures for recovering technology infrastructure and systems after a natural or human-induced disaster.

Workflow Automation

Process Automation

Technology that automates the sequence of tasks, approvals, and handoffs within business processes.

Total Experience

Core ERP

A business strategy that creates superior shared experiences by interlinking customer experience, employee experience, user experience, and multi-experience across all touchpoints.

Business Intelligence

Technologies, practices, and strategies for collecting, integrating, and analysing business data to support decision-making.

Digital Adoption Platform

Core ERP

Software that overlays on enterprise applications to guide users through features and processes in real time.

ELT

CRM & Customer

Extract, Load, Transform — a modern data pipeline approach where raw data is loaded first and transformed within the target system.

Technical Debt

Core ERP

The implied cost of additional rework caused by choosing an easy or limited solution now instead of a better approach.

Enterprise AI Platform

Core ERP

An integrated software platform that provides organisations with tools for building, deploying, and managing AI applications at enterprise scale with governance, security, and compliance controls.