Suggested Data Lake layers
1 min readMar 19, 2019
A typical data lake could have the following folder structure:
Suggested Data Lake layers:
- Landing data layer (Suggested folder name: landing) — Raw events are stored for historical reference. Also called the staging layer or landing area.
- Curated data layer (Suggested folder name: curated) — Raw events are transformed (cleaned and mastered) into directly consumable data sets. The aim is to uniform the way files are stored in terms of encoding, format, data types and content (i.e. strings). Also called the conformed layer.
- Application data layer (Suggested folder name: application) — Business logic is applied to the cleansed data to produce data ready to be consumed by applications (i.e. DW application, advanced analysis process, etc). This is also called by a lot of other names: workspace, trusted, gold, secure, production ready, governed.
- Sandbox data layer (Suggested folder name: sandbox) — Optional layer to be used to “play/experiment” in. Also called exploration layer or data science workspace.
- Temp data layer (Suggested folder name: temp) — Optional layer to be used to store temporary files and folders.