Skip to content

Deploy a Data Product

The deployment process for Data Products involves integration into a VNet, specifically utilizing the Sidra VNet in this instance. To successfully deploy the Data Product, the primary actions required include configuring subnets and configuring the Sidra Web Manager form.

1. Configure subnets in the existing Sidra VNet

In a Sidra installation, within the primary Resource Group (RG), there is a Virtual Network (VNet) resource. This resource will be named using the format <deploymentPrefix>-<environmentId>-VNet (example: sds-test-VNet).

In the Settings section, the Subnets panel lists all the subnets of the VNet. By clicking on + Subnet, an Azure form will be pop-up to create a new subnet. The example below displays the two subnets already created, required for the Data Product deployment.

subnet_configuration

On the following sections the required subnets and their configurations will be listed.

1.1 Databricks private and public subnets

The Data Product requires new subnets on the VNet:

  • The Databricks cluster private subnet. The name may be <dataproductname>_databricks_cluster.
  • The Databricks cluster public subnet. The name may be <dataproductname>_databricks_host.

Both subnets configuration must take into account:

  1. Service Endpoints:

    • Microsoft.KeyVault
    • Microsoft.Storage
  2. Subnet Delegation:

    • Microsoft.Databricks/workspaces

1.2 Associate subnets to the Sidra Network Security Groups

By clicking on each of the NSGs, go to Settings section and Subnets option. Here the association between the Sidra NSGs and the previously created private and public subnets can be performed.

associate_subnet_nsg subnet_configuration

1.3 Check existing VNet subnets

The rest of resources of the Data Product are connected to the VNet through the existing subnets of Sidra.

2. Create the Data Product in Sidra UI

  1. In Sidra Web Manager, access to the Data Product section and create a new Data Product.
  2. Configure DataProduct section contains the following fields:
    • Data Product name. Required field. It must be a name that represents the purpose of the Data Product. Character length is limited to between 1 and 15 characters.
    • Data Product description. Optional field for the description of the Data Product. Character length is limited to between 1 and 200 characters.
    • Resource name prefix. Required field. It names the resources created for this Data Product (<prefix>-<environment>-<resourceTypeSufix>). Do not start the value with numbers or special characters. Character length is limited to 4 characters. For example, DP01.
    • Resource deployment environment. Required field. Environment in which the Data Product will be deployed, for example: qa, dev, test, prod, etc. Character length is limited to 4 characters.
    • Data Product resource group name. Required field. It refers to the new Data Product resource group that will be created during the Data Product deployment with all the resources. Character length is limited to between 1 and 15 characters. Although for our example the name is Data.Domain.DP01, a recommended naming convention is Sidra.DP.<DPName> to enhance identification purposes.
  3. Set up the next Data Product VNet section with the Databricks private subnet and Databricks public subnet created in the steps above.

  4. Check your new Data Product in Data Products section in Sidra Web.

    Data Product Created

3. Check your Data Product in the Azure Portal

After the above process, check your resources groups. Approximately 15 minutes after the creation in Sidra Web, all resources are completely provisioned within their respective resource groups. The structure is as follows:

  • Data.Domain.DP01

    data product rg

  • Data.Domain.DP01-dp01-qa-dbr (the managed resource group by the Azure Databricks resource)