Connecting to new data sources through linked services¶
Linked services are one of the main actors in Data Factory. They behave as connection strings, allowing Data Factory to connect with different data sources: SQL servers, REST endpoints, blob folders... It is common that adding a new data source requires adding a new linked service to Data Factory so it can access to the data in the source.
Sidra creates several linked services required to the normal functioning of the platform. They are created along with the Data Factory using the following ARM templates:
- datafactory.datalake.jsonfor the Data Factory associated to a particular Sidra Service Data Storage Unit.
- datafactory.client.jsonfor the Data Factory associated to a particular Data Product.
Those ARM templates are included in the PlainConcepts.SIDRA.DotNetCore.Deployment NuGet package and referenced by the deployment project.
The ARM templates provided with the NuGet package must not be modified since any change will be lost when a new version of the package is released.
Adding new linked services¶
In order to configure a new linked service it is necessary to have a new ARM template that defines the linked service and a new script that uses the ARM template to create the resource in Azure.
Linked service ARM template¶
The new ARM template for the linked service must be created in a CustomTemplates folder in the root of the deployment project. The template will depend on the requirements:
- The type of data source: a SQL Server, a REST endpoint...
- The way to authenticate into the data source: connection string, username and password...
- The configuration of the linked service, e.g. if it is going to connect using a Integration Runtime...
- The integration with other services, e.g. if it has to retrieve some data from Azure KeyVault -like a connection string or a password-...
Some of those parameters can be hard-coded in the template, e.g. the type of linked service: AzureStorage, AzureSqlDatabase..., and some others can be received as parameters.
No password, client secret or any other sensitive information should be stored hard-coded in the ARM template.
Here it is an example of an ARM template -stored in a file named datafactory.linkedservices.json- to connect to a SQL server using a connection string and username as parameters and retrieving the password from Azure KeyVault:
{
    "$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
    "contentVersion": "1.0.0.0",
    "parameters": {
        "dataFactoryName": {
            "type": "string"
        },
        "linkedServiceName": {
            "type": "string"
        },
        "linkedServiceDescription": {
            "type": "string"
        },
        "dataSourceConnectionString": {
            "type": "string"
        },
        "dataSourceUsername": {
            "type": "string"
        },
        "keyVaultSecretNameForDataSourcePassword": {
            "type": "string"
        },
        "keyVaultLinkedServiceName": {
            "type": "string"
        }
    },
    "variables": {
        "dataFactoryAPIVersion": "2018-06-01"
    },
    "resources": [{
            "name": "[concat(parameters('dataFactoryName'), '/', parameters('linkedServiceName'))]",
            "type": "Microsoft.DataFactory/factories/linkedservices",
            "dependsOn": [],
            "apiVersion": "[variables('dataFactoryAPIVersion')]",
            "properties": {
                "type": "SqlServer",
                "description": "[parameters('linkedServiceDescription')]",
                "typeProperties": {
                    "connectionString": "[parameters('dataSourceConnectionString')]",
                    "userName": "[parameters('dataSourceUsername')]",
                    "password": {
                        "type": "AzureKeyVaultSecret",
                        "store": {
                            "referenceName": "[variables('keyVaultLinkedServiceName')]",
                            "type": "LinkedServiceReference"
                        },
                        "secretName": "[variables('keyVaultSecretNameForDataSourcePassword')]"
                    }
                }
            }
        }
    ],
    "outputs": {}
}
How to deploy linked services using the ARM template¶
The ARM templates are used to define the resources to deploy in Azure, but deployment itself is executed by a PowerShell script that uses those ARM templates. All the deployment scripts use the same PowerShell function DeployTemplate.ps1 which actually implements the deployment.
function DeployTemplate
{
    Param(
        [Parameter(Mandatory=$true)]
        [string]$templateFile,
        [Parameter(Mandatory=$true)]
        [object]$templateParametersObject,
        [Parameter(Mandatory=$true)]
        [string]$resourceGroupName,
        [Parameter(Mandatory=$false)]
        [string]$suffix = ""
    )
    ...
}
The function receives the following parameters:
- templateFile is the name of the ARM template file
- templateParameterObject is a PowerShell object with all the parameter for the ARM template
- resourceGroupName is the name of the Azure resource group where it will be deployed
- suffix is used in the name of the deployment
So a new script must be created -named for example LinkedServicesDeploy.ps1- and stored in the Scripts folder of the deployment project, and it will call the DeployTemplate.ps1 function this way:
$scriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
$functionsDir  = Join-Path -Path $scriptDir -ChildPath "\Functions"
$customTemplatesDir = Join-Path -Path $scriptDir -ChildPath "..\CustomTemplates"
. "$functionsDir\DeployTemplate.ps1"
# Prepare parameters for the DeployTemplate
$linkedServiceTemplatePath = "$customTemplatesDir\datafactory.linkedservices.json"
# Create PS object with the ARM template parameters
$templateParameters = @{}
$templateParameters.Add("dataFactoryName", "datafactory")
$templateParameters.Add("linkedServiceName", "MySqlServerLinkedService")
$templateParameters.Add("linkedServiceDescription", "Linked service to connect to MySqlServer")
$templateParameters.Add("dataSourceConnectionString", "Connection string to the Sql Server")
$templateParameters.Add("keyVaultSecretNameForDataSourcePassword", "MySqlServerPassword")
$templateParameters.Add("keyVaultLinkedServiceName", "CoreAzureKeyVaultLinkedService")
DeployTemplate -templateFile $linkedServiceTemplatePath `
    -templateParametersObject $templateParameters `
    -resourceGroupName "Name of the resource group where the Data Factory is deployed" `
    -ErrorAction Stop
New deployment script with the variables used by the linked service¶
In order for the new linked service to work properly, it will need some information, such as the connection string for a SQL Server or the endpoint of an API. That information will be passed to the datafactory.linkedservice.json as parameters by the deployment script. The script can receive those values in two different ways:
- via a environment data file
- via script parameters
Environment data file¶
A new variable can be defined in the environment data file and fulfilled with the value that the linked service will receive. This via is usually used for values that are going to change from one environment to another and that are not secrets, since the environment data files are stored in the control version repository.
# Environment data file
@{
    DataFactoryName = "project-dft-dev"
    ResourceGroupName = "PROJECT-UKSOUTH-DSU-DEV"   
}
Then, in the deployment script, the environment data file must be imported and these values can be read and store in different variables to be used as parameters to the ARM template as shown below:
Param(
    [Parameter(Mandatory=$true)]
    [string]$environment)
$scriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
$functionsDir  = Join-Path -Path $scriptDir -ChildPath "\Functions"
$customTemplatesDir = Join-Path -Path $scriptDir -ChildPath "..\CustomTemplates"
. "$functionsDir\DeployTemplate.ps1"
$dataScriptPath = $environment + "Data.psd1"
Import-LocalizedData -BaseDirectory $scriptDir -FileName $dataScriptPath -BindingVariable environmentDataFile -ErrorAction Stop
# Prepare parameters for the DeployTemplate
$linkedServiceTemplatePath = "$customTemplatesDir\datafactory.linkedservices.json"
$dataFactoryName = $environmentDataFile.DataFactoryName
$resourceGroupName = $environmentDataFile.ResourceGroupName
# Create PS object with the ARM template parameters
$templateParameters = @{}
$templateParameters.Add("dataFactoryName", $dataFactoryName)
$templateParameters.Add("linkedServiceName", "MySqlServerLinkedService")
$templateParameters.Add("linkedServiceDescription", "Linked service to connect to MySqlServer")
$templateParameters.Add("dataSourceConnectionString", "Connection string to the Sql Server")
$templateParameters.Add("keyVaultSecretNameForDataSourcePassword", "MySqlServerPassword")
$templateParameters.Add("keyVaultLinkedServiceName", "CoreAzureKeyVaultLinkedService")
DeployTemplate -templateFile $linkedServiceTemplatePath `
    -templateParametersObject $templateParameters `
    -resourceGroupName $resourceGroupName `
    -ErrorAction Stop
Deployment script parameters¶
The deployment script can define a parameter that will be fulfilled by the Azure DevOps when it executes the task that invokes the deployment script. This via is usually used for secrets that will be stored in a Library in Azure DevOps. This method can be combined with the previous one:
Param(
    [Parameter(Mandatory=$true)]
    [string]$environment,
    [Parameter(Mandatory=$true)]
    [string]$sqlServerConnectionString,)
$scriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path
$functionsDir  = Join-Path -Path $scriptDir -ChildPath "\Functions"
$customTemplatesDir = Join-Path -Path $scriptDir -ChildPath "..\CustomTemplates"
. "$functionsDir\DeployTemplate.ps1"
$dataScriptPath = $environment + "Data.psd1"
Import-LocalizedData -BaseDirectory $scriptDir -FileName $dataScriptPath -BindingVariable environmentDataFile -ErrorAction Stop
# Prepare parameters for the DeployTemplate
$linkedServiceTemplatePath = "$customTemplatesDir\datafactory.linkedservices.json"
$dataFactoryName = $environmentDataFile.DataFactoryName
$resourceGroupName = $environmentDataFile.ResourceGroupName
# Create PS object with the ARM template parameters
$templateParameters = @{}
$templateParameters.Add("dataFactoryName", $dataFactoryName)
$templateParameters.Add("linkedServiceName", "MySqlServerLinkedService")
$templateParameters.Add("linkedServiceDescription", "Linked service to connect to MySqlServer")
$templateParameters.Add("dataSourceConnectionString", $sqlServerConnectionString)
$templateParameters.Add("keyVaultSecretNameForDataSourcePassword", "MySqlServerPassword")
$templateParameters.Add("keyVaultLinkedServiceName", "CoreAzureKeyVaultLinkedService")
DeployTemplate -templateFile $linkedServiceTemplatePath `
    -templateParametersObject $templateParameters `
    -resourceGroupName $resourceGroupName `
    -ErrorAction Stop
Add secrets or sensitive information in the Azure DevOps library¶
Sometimes, the information needed to configure the linked service is sensitive. In this case, storing the information as plain text inside the environment data file is not an option.
Those values can be added inside the Azure DevOps library, e.g. the Dev Environment Secrets library with the sqlServerConnectionString variable:
Once defined in the library, the variables can be used in the Release pipeline by adding them in the Azure PowerShell execution task.
