Creating your Data Lake using Azure Data Lake Gen 2
2 min readMar 19, 2019
This tutorial will explain how to create your Data Lake using Azure Data Lake Gen 2.
More info: https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-quickstart-create-account
Data Lake layers
Suggested Data Lake layers:
- Landing data layer (Suggested folder name: landing) — Raw events are stored for historical reference. Also called the staging layer or landing area.
- Curated data layer (Suggested folder name: curated) — Raw events are transformed (cleaned and mastered) into directly consumable data sets. The aim is to uniform the way files are stored in terms of encoding, format, data types and content (i.e. strings). Also called the conformed layer.
- Application data layer (Suggested folder name: application) — Business logic is applied to the cleansed data to produce data ready to be consumed by applications (i.e. DW application, advanced analysis process, etc). This is also called by a lot of other names: workspace, trusted, gold, secure, production ready, governed.
- Sandbox data layer (Suggested folder name: sandbox) — Optional layer to be used to “play/experiment” in. Also called exploration layer or data science workspace.
- Temp data layer (Suggested folder name: temp) — Optional layer to be used to store temporary files and folders.