How to connect Azure Data Lake Store with Azure Databricks
This tutorial demonstrates how to connect Azure Data Lake Store with Azure Databricks.
Use case:
Read files from Azure Data Lake Store using Azure Databricks Notebooks.
Assumptions:
- You understand Azure Data Lake Store.
- You understand Azure Databricks and Spark.
- You understand how to create a Service Principal and how to use Azure Portal.
- You have basic knowledge of Linux and Scala.
Tips:
Application ID = Client ID
Credential = Service principal Key
dfs.adls.oauth2.refresh.url = Go to Azure Active Directory -> App registrations -> Endpoints -> OAUTH 2.0 TOKEN ENDPOINT
Please change the parameters, the code below will not work unless you change to your parameters.
There are two options to read and write Azure Data Lake data from Azure Databricks:
- DBFS mount points;
- Spark configs.
Using DBFS mount points:
val configs = Map(
"dfs.adls.oauth2.access.token.provider.type" -> "ClientCredential",
"dfs.adls.oauth2.client.id" -> "b0c9a068-e32e-4636-b50d-7f2d667a00bc",
"dfs.adls.oauth2.credential" -> "rj+IAKT7TcZSdoVhLf22R0QBJvJqeOtLd3++DuwNdUk=",
"dfs.adls.oauth2.refresh.url" -> "https://login.microsoftonline.com/16f460a0-7afd-453a-9e41-71cc13e29e52/oauth2/token")dbutils.fs.mount(
source = "adl://adlsdemocaio.azuredatalakestore.net/",
mountPoint = "/mnt/adlsdemocaio2",
extraConfigs = configs)
Using Spark configs:
spark.conf.set("dfs.adls.oauth2.access.token.provider.type", "ClientCredential")
spark.conf.set("dfs.adls.oauth2.client.id", "b0c9a023-e32e-4636-b50d-7f2d667a00bc")
spark.conf.set("dfs.adls.oauth2.credential", "rj+IAKT8TcZSdoVhLfi2R1QBJvJqeOtLd3++DuwNdUk=")
spark.conf.set("dfs.adls.oauth2.refresh.url", "https://login.microsoftonline.com/16f460a0-7ffc-453a-9e41-71cc13e29e52/oauth2/token")
List files
%fs ls /mnt/adlsdemocaio2/
Hints:
To avoid a similar error:
org.apache.hadoop.security.AccessControlException: LISTSTATUS failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.). [5bf9b6af-eddd-4913-aaea-9492a5816cc1] failed with error 0x83090aa2 (Forbidden. ACL verification failed. Either the resource does not exist or the user is not authorized to perform the requested operation.). [5bf9b6af-eddd-4913-aaea-9492a5816cc1][2019–03–04T06:39:47.3347844–08:00] [ServerRequestId:5bf9b6af-eddd-4913-aaea-9492a5816cc1]
Make sure you have all privileges.
- When you create your App, make sure you are the owner of the app. If you do not appear as the owner, click on add owner and add your e-mail.
2. In your, Azure Data Lake Store make sure you give permission to your app.
In my case, my app is called adlsgen1databricks.
Reference links:
https://docs.microsoft.com/en-us/azure/azure-resource-manager/resource-group-create-service-principal-portal
https://hadoop.apache.org/docs/r2.8.0/hadoop-azure-datalake/index.html