Creating your Data Lake using Azure Data Lake Gen 2

Caio Moreno
2 min readMar 19, 2019

--

This tutorial will explain how to create your Data Lake using Azure Data Lake Gen 2.

#

More info: https://docs.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-quickstart-create-account

Data Lake layers

Suggested Data Lake layers:

  • Landing data layer (Suggested folder name: landing) — Raw events are stored for historical reference. Also called the staging layer or landing area.
  • Curated data layer (Suggested folder name: curated) — Raw events are transformed (cleaned and mastered) into directly consumable data sets. The aim is to uniform the way files are stored in terms of encoding, format, data types and content (i.e. strings). Also called the conformed layer.
  • Application data layer (Suggested folder name: application) — Business logic is applied to the cleansed data to produce data ready to be consumed by applications (i.e. DW application, advanced analysis process, etc). This is also called by a lot of other names: workspace, trusted, gold, secure, production ready, governed.
  • Sandbox data layer (Suggested folder name: sandbox) — Optional layer to be used to “play/experiment” in. Also called exploration layer or data science workspace.
  • Temp data layer (Suggested folder name: temp) — Optional layer to be used to store temporary files and folders.

--

--

Caio Moreno
Caio Moreno

Written by Caio Moreno

Solutions Architect @databricks | Professor | PhD | Ex-Microsoft | Ex-Avanade/Accenture | Ex-Pentaho/Hitachi | Ex-AOL | Ex-IT4biz CEO. (Opinions are my own)

No responses yet