Data Lakes

A data lake is a centralized repository that ingests and stores large volumes of data in its original form. The data files are typically stored in staged zones—raw, cleansed, and curated—so that different types of users may use the data in its various forms to meet their needs.

Data Lake vs. Data Warehouse

	Data lake	Data warehouse	Data lakehouse
Type	Structured, semi-structured, unstructured	Structured	Structured, semi-structured, unstructured
	Relational, non-relational	Relational	Relational, non-relational
Schema	Schema on read	Schema on write	Schema on read, schema on write
Format	Raw, unfiltered	Processed, vetted	Raw, unfiltered, processed, curated, delta format files
Sources	Big data, IoT, social media, streaming data	Application, business, transactional data, batch reporting	Big data, IoT, social media, streaming data, application, business, transactional data, batch reporting
Scalability	Easy to scale at a low cost	Difficult and expensive to scale	Easy to scale at a low cost
Users	Data scientists, data engineers	Data warehouse professionals, business analysts	Business analysts, data engineers, data scientists
Use cases	Machine learning, predictive analytics, real-time analytics	Core reporting, BI	Core reporting, BI, machine learning, predictive analytics

Digital Garden

Explorer

Data Lakes

Data Lake vs. Data Warehouse

Graph View

Backlinks