
Introduction
If you’re using Azure Databricks for data analysis, you probably have your data stored in Azure Data Lake. But how do you access that data in Databricks? The answer is simple: mount your storage in Databricks! In this blog, we’ll explain what mounting is, why it’s useful, and how you can quickly set up a mount for Azure Data Lake in Databricks.
What is a Mount in Databricks?
Think of a “mount” as a shortcut between your Databricks workspace and your Azure Data Lake. It allows Databricks to connect to a specific folder in your Azure storage so you can easily read and write files directly. It’s like linking a drive on your computer; once mounted, you can open, read, and write files without needing to set up a new connection each time.
Why Should You Mount Storage in Databricks?
- Simple Access: Once your storage is mounted, you can access it like any other folder in Databricks.
- Single Setup: You only need to mount a folder once, and after that, it stays connected across your Databricks workspace.
- Security and Organization: You can control access to specific folders and make sure all your data is organized and easy to reach.
Step-by-Step Guide to Mount Azure Data Lake in Databricks
Let’s walk through how to mount Azure Data Lake Storage (ADLS) in Databricks.
Step 1: Set Up Your Storage Account and Permissions
Before mounting, ensure you have:
- An Azure Storage Account (either ADLS Gen2, or Blob Storage).
- Access credentials for your storage, such as an Access Key or a Service Principal (recommended for security).
Step 2: Get the Storage Access Details
For Access Key: In your Azure Storage Account, go to Access keys and copy one of the keys.
For Service Principal: Set up an app registration in Azure Active Directory, and grant it access to the storage. You’ll need:
-
- Client ID (Application ID),
- Tenant ID, and
- Client Secret.
Step 3: Write the Mount Command in Databricks
Now, let’s write a command to mount the storage in Databricks. Open a Databricks notebook and type in the following command:
For ADLS Gen2 using Service Principal:
Python
# Define your credentials
storage_account_name = “<your-storage-account-name>”
client_id = “<your-service-principal-client-id>”
tenant_id = “<your-tenant-id>”
client_secret = “<your-service-principal-client-secret>”
container_name = “<your-container-name>”
# Set up the configuration
configs = {
“fs.azure.account.auth.type”: “OAuth”,
“fs.azure.account.oauth.provider.type”: “org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider”,
“fs.azure.account.oauth2.client.id”: client_id,
“fs.azure.account.oauth2.client.secret”: client_secret,
“fs.azure.account.oauth2.client.endpoint”: f”https://login.microsoftonline.com/{tenant_id}/oauth2/token”
}
# Mount the storage
dbutils.fs.mount(
source = f”abfss://{container_name}@{storage_account_name}.dfs.core.windows.net/”,
mount_point = “/mnt/mydata“,
extra_configs = configs
)
Here’s what each part does:
- configs: Sets up access to your Azure storage using the credentials.
- mount_point: This is the Databricks path where the storage will be accessible. You can name it something meaningful, like “/mnt/mydata”.
Step 4: Verify the Mount
To make sure the mount worked, list the files in the mounted folder:
Python
display(dbutils.fs.ls(“/mnt/mydata”))
You should see the files and folders in your Azure storage container appear in Databricks.
Unmounting Storage
If you ever need to disconnect the storage, you can unmount it like this:
Python
dbutils.fs.unmount(“/mnt/mydata”)
Wrapping Up
Mounting Azure Data Lake storage in Databricks simplifies your workflow, letting you access your data as if it were local. Once mounted, you can easily read and write data, making it much simpler to manage your files in the cloud.
Now you’re all set to work with your data in Databricks seamlessly!
ITECHSTORECA
FOR ALL YOUR TECH SOLUTIONS