Generic artifacts created in DataFactory

Here it is a list of triggers, datasets and pipelines that are predefined in Sidra, so they are automatically created in the Data Factories in Core.

Trigger for landing zone

There is a trigger named "Core Storage Blob Created" defined for each Data Lake. This trigger is based on the "EventTrigger" template and it is configured to execute the ingest-from-landing pipeline of the corresponding Data Factory.

The way to configure the Azure Blob to watch is by means of the placeholders in the parameters. The placeholders basePath and scope from the trigger template will be replaced by these values:

1
2
3
4
{
    "basePath": "##landingZone/1/BasePath##",
    "scope": "##landingZone/1/StorageResourceId##"
}

At the same time, those values contains the placeholders ##landingZone/1/BasePath## and ##landingZone/1/StorageResourceId## that will be replaced in the Global placeholder resolution round.

Those placeholders follow these conventions:

##landingZone/id/BasePath##

##landingZone/id/StorageResourceId##

In which id is the identifier of the landing zone. With that information the Data Factory Manager is able to obtain the values that those placeholders represent:

  • The BasePath of the LandingZone identified by id.
  • The Id of the Storage related to the LandingZone identified by id.

In summary, the trigger will watch the Azure Blob defined as Landing Zone identified by the id included in the placeholders and it will execute the pipeline ingest-from-landing when a new file is stored in it.

Datasets for landing zone

There are two datasets templates defined for the landing zones:

  • LandingZoneDataset. It references a folder of the landing zone. Has the parameter folderPath.

  • LandingZoneFileDataset. It references a file in a folder of the landing zone. Has two parameters: folderPath and fileName.

The datasets reference resources by means of the linked services and the mechanism used to reference the appropriate resource is by using a placeholder in the template and a DefaultValue with the name of the linked service. This is the template of LandingZoneFileDataset.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
{
    "name": "LandingZoneFileDataset",
    "properties": {
        "type": "AzureBlob",
        "linkedServiceName": {
            "referenceName": "##linkedService##",
            "type": "LinkedServiceReference"
        },
        "typeProperties": {
            "folderPath": "@dataset().folderPath",
            "fileName": "@dataset().fileName",
            "format": {
                "type": "TextFormat",
                "rowDelimiter": "\n"
            }
        },
        "parameters": {
            "folderPath": {
                "type": "String",
                "defaultValue": ""
            },
            "fileName": {
                "type": "String",
                "defaultValue": ""
            }
        }
    }
}

This is the DefaultValue of LandingZoneFileDataset.

1
{"linkedService": "LocalAzureStorageLinkedService" }

So the placeholder ##linkedService## will be replaced by LocalAzureStorageLinkedService which is the name used in the Linked Service to the Azure Storage Blob account of the Landing Zone. That linked service was created using an ARM template in the deployment project. So the way to associate linked services with the rest of Data Factory objects is by using the same name in the creation of the linked service -in the deployment project- and in the DefaultValues that replaces the placeholders that references it.

Sidra does not store datasets by themselves the same way it does with triggers, it only stores dataset templates. When the dataset templates are used in activity templates, it is required to know the parameters of the dataset used, so the activity template can define the values for those parameters.

For example the LandingFileZoneDataset template is used in the ImportFile activity template. When the activity references the dataset -in this case using a ##dataSetName## placeholder that will be resolved to LandingZoneFileDataset- it has also to define the values for the folderPath and folderName parameters -in this case it is also using placeholders ##folderPath## and ##fileName##-.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
"datasets": [
    {
        "referenceName": "##dataSetName##",
        "type": "DataSetReference",
        "parameters": {
            "folderPath": "##folderPath##",
            "fileName": "##fileName##"
        }
    }
]