Azure Data Factory, a powerful Cloud ETL tool.
One of the most important decisions in an Analytics project is your ETL/ELT tool.
I have seen in the past, people using pure Java and Python to implement data pipelines, and there are some cases that maybe this is the only option, but for all other options you should consider to use an ETL/ELT tool to speed up the process and let you spend your time with other parts of the project and code less possible.
You will need to connect with multiple sources, load data, transform data, etc and doing this without a tool makes your life very hard.
So, please consider using a powerful ETL tool.
There are existing ETL tools in the market like Informatica, Pentaho Data Integration, Trifacta, etc.
Today, we will talk about how to use Azure Data Factory version 2, the cloud ETL/ELT tool from Microsoft Azure.
What is ADFv2 (Azure Data Factory version 2)?
Azure Data Factory (ADF) is a service designed to allow developers to integrate disparate data sources. It provides access to on-premises data in SQL Server and cloud data in Azure Storage (Blob and Tables) and AzureSQL Database.
How to start to use ADFv2?
First, you do not install it, you create a service in Azure by:
New -> Analytics -> Data Factory
Then, you need to set the name, select your subscription, resource group, version (1 or 2) and location.
To start using the Cloud ETL tool, you have to click in the link “Author & Monitor”.
Use the image below as a reference.
Below you can see the welcome screen, where you can create a pipeline, use the Copy Data Wizard, configure SSIS Integration Runtime and Set up Code Repository (VSTS/Git).
I strongly recommend you to set up your git repository using the repository settings.
In the create pipeline you will open the UI where you can create your pipelines and build your ETL or ELT.
You can run by clicking Trigger.
I wrote in another post about ADFv2, click here to read it.