Metadata & Data Factory association

On the one hand, the Sidra platform keeps some metadata information about the files ingested in the Data Lake. That metadata includes the association of every file to an entity. On the other hand -as can be seen in the Data Storage Unit tables and Data Factory tables sections- each entity is associated with a pipeline by means of the EntityPipeline table. At last, the Pipeline table contains a reference to the Data Factory in which the pipeline will be executed.

By following the previous chain of associations, it can be seen that the files and theirs metadata are associated to a specific Data Factory.

asset-data-factory-association

Linked Services

Linked services are one of main component in Azure Data Factory. They are used as a way to define a connection from the Data Factory to an external resource. There is no metadata about Linked Services in Sidra. Since they are very close to resources, the management of linked services is realized in the deployment project, just where the resources are also created.

The linked services creation is realized by using ARM templates. Here it is a sample of two ARM templates that create two linked service for the Data Factory "MyDataFactory". One is a linked service to an Azure KeyVault "MyAzureKeyVaultLinkedService" and the other is to an Azure Storage "MyAzureStorageLinkedService".

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
{
    "type": "linkedservices",
    "name": "MyAzureKeyVaultLinkedService",
    "dependsOn": [
        "MyDataFactory"
    ],
    "apiVersion": "2017-09-01-preview",
    "properties": {
        "type": "AzureKeyVault",
        "typeProperties": {
            "baseUrl": "[concat('https://', 'myAzureKeyVault', '.vault.azure.net')]"
        }
    }
},
{
    "type": "linkedservices",
    "name": "MyAzureStorageLinkedService",
    "dependsOn": [
        "MyDataFactory", 
        "MyAzureKeyVaultLinkedService"
    ],
    "apiVersion": "2017-09-01-preview",
    "properties": {
        "type": "AzureStorage",
        "typeProperties": {
            "connectionString": {
                "type": "AzureKeyVaultSecret",
                "secretName": "MyAzureStorageLinkedService",
                "store": {
                    "referenceName": "MyAzureKeyVaultLinkedService",
                    "type": "LinkedServiceReference"
                }
            }
        }
    }
},

The names of the linked services are used to provide the values for the placeholders in the templates of datasets, activities... In particular, they are commonly used in the DefaultValue column of the ActivityTemplate and DatasetTemplate.

For example, the name of the Azure Storage linked service "MyAzureStorageLinkedService" can be used to replace the placeholder ##linkedService## in the LandingZoneFileDataset template:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
    "name": "LandingZoneFileDataset",
    "properties": {
        "type": "AzureBlob",
        "linkedServiceName": {
            "referenceName": "MyAzureStorageLinkedService",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "folderPath": "@dataset().folderPath",
            "fileName": "@dataset().fileName",
            "format": {
                "type": "TextFormat",
                "rowDelimiter": "\n"
            }
        },
        "parameters": {
            "folderPath": {
                "type": "String",
                "defaultValue": ""
            },
            "fileName": {
                "type": "String",
                "defaultValue": ""
            }
        }
    }
}