Configuration Steps¶
Before configuring a DIP for Snowflake, you will need to enable Snowflake to access the Sidra DSU stage storage. For more information, check Snowflake DIP requirements.
The process of setting up a Snowflake Database Data Intake Process involves several key actions. The different sections of the input form are organized into these main steps.
Step 1. Configure Data Intake Process¶
Please see the common Sidra connector plugins section for the parameters needed to configure a Data Intake Process.
Step 2. Configure Provider¶
Please see the common Sidra connector plugins section for the parameters needed to create a new Provider.
Step 3. Configure Data Source¶
A Data Source abstracts the details of creating a Linked Service in Azure Data Factory, and represents the connection to the source database.
Some key aspects to consider in this section are:
- The following fields are mandatory: Account name, default Database to connect to, Warehouse, authentication type, and credentials.
- The Sidra Snowflake connector will register this new data source in Sidra's metadata and deploy a Linked Service in Azure Data Factory with this connection.
- The connection string to the snowflake data source, including the credentials, will be stored securely on Sidra's Core Key Vault.
For more details on Linked Services, check the Data Factory documentation.
The following information needs to be provided:
Account name
: The full name of your Snowflake account (e.g., myorg-account123).Database
: The default database to connect to. It must already exist and the specified role must have privileges on it. Specific tables or views to load will be configured later in the Metadata Extractor section.Warehouse
: The virtual warehouse to connect to. It must already exist and the specified role must have privileges on it.Authentication Type
: Type of authentication used to connect to the Snowflake service. Allowed values are: Basic (Default) and Key Pair.Username
: The login name of the user for the connection.
If the Authentication Type is Basic:
Password
: The user's password.
If the password is stored as a secret in the DSU Key Vault, select 'Use existing secrets'. This will replace the Password field with:
Password' Secret Name
: The name of the secret in the Key Vault that holds the user's password.
If the selected Authentication Type is Key Pair, provide the following:
Private Key' Secret Name
: The name of the secret in the Key Vault that contains the private key for the connection. The private key can be encrypted. In this case, you must also configure the passphrase used for decryption.Private Key Passphrase' Secret Name
: The name of the secret in the Key Vault containing the passphrase used to decrypt the private key. Only required if the private key is encrypted.
To ensure the correct handling of multi-line private keys, we recommend storing it in the Azure Key Vault using the Azure CLI:
az keyvault secret set --vault-name <DSUKeyvaultName> --name <privateKeysecretName> --file <privateKeyFilePath>
Step 4. Configure Metadata Extraction¶
The Sidra metadata extraction process retrieves structural information about your data from Snowflake, mapping it into Sidra's internal metadata model. This is achieved by consuming metadata from the Snowflake database - querying the TABLES system View - using the information you provide here.
The extracted schema is replicated to the Sidra Metadata model hierarchy. Tables in the Snowflake Database source system are replicated in Sidra as Entities, and columns of these tables, as Attributes.
Under the covers, the entity table in Sidra's metadata model contains data about the format of the Entity generated, as well as information about how this Entity should be handled by the system. Attributes contain data about the columns of the tables. Sidra adds additional system-level Attributes to convey system-handling information at the level of the Entity (or table).
The metadata extraction step collects the necessary information for Sidra to:
- Automatically create and populate Sidra's Core metadata structures (Entities and Attributes).
- Create, deploy, and execute an Azure Data Factory pipeline for the metadata extraction steps (see steps below).
- Execute the create tables and DSU Ingestion code, including all the steps for storing the optimized data in the data lake storage.
You can optionally configure the following parameters to control the metadata extraction process:
Number of Tables per Batch
: Determines the batch size for metadata retrieval, allowing optimization of pipeline execution.Types of Objects to Load
: Specify whether you want to load tables, views, or both.Object Restriction Mode
: Controls the scope of extracted objects. Available options are:Include all objects
: Loads all available tables and/or views.Include some objects
: Loads only the tables or views explicitly listed by the user.Exclude some objects
: Imports all tables or views except those explicitly listed by the user.- Enabling the metadata to be refreshed automatically with each data extraction is available.
Once the metadata extractor pipeline is created, Sidra automatically executes it. The execution typically takes between 5 to 15 minutes, depending on the current load on Azure Data Factory and the Databricks cluster performance.
Upon successful completion, the new Entities will appear in Sidra’s Data Catalog automatically. Sidra will also notify you through the Notifications widget in the Sidra Web UI.
Step 5. Advanced Configuration¶
In this section, you can specify the consolidation mode to be used during data intake into the Data Storage Unit (DSU). The default setting (Merge mode) is recommended for most scenarios. You can learn more about consolidation modes in Sidra’s documentation.
Step 6. Configure Trigger¶
This step allows you to configure the frequency and timing for data ingestion from Snowflake.
For detailed guidance on configuring triggers, including options for scheduling or manually triggering data ingestion, refer to the Sidra Connector Plugins Configuration section of the documentation.