A data lake is a centralized repository that ingests and stores large volumes of data in its original form. The data files are typically stored in staged zones—raw, cleansed, and curated—so that different types of users may use the data in its various forms to meet their needs.

Data Lake vs. Data Warehouse

Data lakeData warehouseData lakehouse
TypeStructured, semi-structured, unstructuredStructuredStructured, semi-structured, unstructured
Relational, non-relationalRelationalRelational, non-relational
SchemaSchema on readSchema on writeSchema on read, schema on write
FormatRaw, unfilteredProcessed, vettedRaw, unfiltered, processed, curated, delta format files
SourcesBig data, IoT, social media, streaming dataApplication, business, transactional data, batch reportingBig data, IoT, social media, streaming data, application, business, transactional data, batch reporting
ScalabilityEasy to scale at a low costDifficult and expensive to scaleEasy to scale at a low cost
UsersData scientists, data engineersData warehouse professionals, business analystsBusiness analysts, data engineers, data scientists
Use casesMachine learning, predictive analytics, real-time analyticsCore reporting, BICore reporting, BI, machine learning, predictive analytics