Data Lakehouse — Technology Wiki

Overview

Direct Answer

A data lakehouse is a unified architecture that merges the cost-efficiency and schema flexibility of data lakes with the ACID transactionality and SQL query performance of data warehouses. It enables organisations to store raw, multi-format data whilst supporting structured analytical queries on the same dataset without ETL replication.

How It Works

Lakehouses employ metadata layers and table formats that impose schematic constraints on top of object storage, allowing engines to enforce data consistency and optimise query execution. Delta Lake and Apache Iceberg exemplify this approach, adding transactional semantics and versioning to distributed file systems. This layering permits both exploratory data science access and governed business intelligence workloads to operate on identical underlying datasets.

Why It Matters

Organisations reduce infrastructure costs by eliminating duplicate data copies across separate systems whilst accelerating time-to-insight for analytics teams. The architecture addresses the growing need for real-time analytics, machine learning pipelines, and regulatory compliance where data lineage and atomicity are critical.

Common Applications

Financial services employ lakehouses for fraud detection and risk analysis across transaction data. Retail and e-commerce organisations leverage them for customer analytics and inventory optimisation. Healthcare organisations use them to consolidate patient records with analytical reporting for outcome studies.

Key Considerations

Implementation complexity and operational overhead remain higher than traditional warehouses; organisations must carefully assess metadata management and query optimisation expertise. Vendor fragmentation and evolving standardisation mean architectural choices today may require future migration.

Related in Business Intelligence

SAP

A leading enterprise software company providing ERP, supply chain, HR, and business intelligence solutions for large organisations.

Master Data Management

The processes, governance, policies, and technologies for ensuring the uniformity, accuracy, and accountability of master data.

Business Intelligence

Technologies, practices, and strategies for collecting, integrating, and analysing business data to support decision-making.

Data Warehouse

A centralised repository of integrated data from multiple sources, designed for query and analysis.

Decision Intelligence

A discipline that augments human decision-making with data analytics, AI, and behavioural science to improve the speed, quality, and outcomes of business decisions.

Digital Thread

An integrated data framework that connects information across the entire product lifecycle from design through manufacturing to service, enabling traceability and analytics.

More in Enterprise Systems & ERP

Enterprise AI Platform

Core ERP

An integrated software platform that provides organisations with tools for building, deploying, and managing AI applications at enterprise scale with governance, security, and compliance controls.

Digital Twin

Core ERP

A virtual replica of a physical system, process, or product that simulates its real-world counterpart for analysis and optimisation.

Order Management System

Core ERP

Software that manages the complete order lifecycle from capture through fulfilment and returns across all sales channels, optimising inventory allocation and customer satisfaction.

Workflow Automation

Process Automation

Technology that automates the sequence of tasks, approvals, and handoffs within business processes.

Citizen Developer

Process Automation

A non-IT employee who creates business applications using low-code or no-code platforms sanctioned by IT departments.

Change Management

Core ERP

A structured approach to transitioning individuals, teams, and organisations from a current state to a desired future state.

Enterprise Architecture

Core ERP

A strategic framework for aligning an organisation's IT infrastructure and processes with its business objectives.

API Gateway

Core ERP

A server that acts as a single entry point for API calls, handling request routing, composition, and protocol translation.