Pipelines: How they work¶
Sidra defines pipelines based on the pipelines templates. Each pipeline is related to a Data Factory where the pipeline will be created and also provides some parameters that will be resolved with the placeholders of the pipeline template. The parameters follow the same structure that the ones for triggers. They are a JSON structure with pairs of: placeholder name, value used for replacement.
Same as triggers, the resolution of parameters follows a list of resolution rounds since it has to replace the placeholders for:
- The pipeline template associated to the pipeline.
- All the dataset templates associated to the pipeline template.
- All the activity templates associated to the pipeline template.
The process of resolution of parameters can be divided in two parts:
- Generate the datasets JSON files and create them in Data Factory.
- Generate the pipeline JSON files -which includes the activities- and create both of them in Data Factory.
The pipeline templates take advantage of the templates defined previously (dataset and activity templates) to configure the pipeline. So, the definition of a pipeline template contains:
- The template of the pipeline.
- The default values for the template of the pipeline.
- A list of dataset templates along with the parameters used to replace the placeholders of each dataset template.
- An ordered list of activity templates along with the parameters used to replace the placeholders of each activity template.
Parameters¶
Pipelines, same as datasets and activities, can define parameters, so in each execution of the pipeline can be triggered with a different set of values for those parameters. The pipelines define a list of activities to execute sequentially.
{
"name": "PipelineName",
"description": "Pipeline description",
"properties": {
"parameters": {
"parameterName01": {
"type": "DateTime"
},
"parameterName02": {
"type": "String"
},
"parameterName03": {
"type": "Int"
},
...
},
"activities": [
{
"name": "Activity01",
"type": "Lookup"
...
},
{
"name": "Activity02",
"type": "SqlServerStoredProcedure"
...
},
...
]
}
}
Dataset generation¶
This is the order of resolution of parameters in the dataset template associated to the pipeline template of the pipeline.
Round | Parameters source |
---|---|
1 | From the association between the pipeline template and the dataset template |
2 | From the pipeline |
3 | Default values included in the dataset template |
4 | Default values included in the pipeline template |
5 | Global placeholders that can be used in different types of templates |
6 | Configuration placeholders with their corresponding values. They are stored in the [Management].[Configuration] table in the Sidra database. |
7 | Entity placeholders. In case of there is a unique Entity associated to the pipeline, a list of Entity placeholders will be replaced by the respective values in that Entity. |
This is the list of Entity placeholders:
- ##Entity.TableName##
- ##Entity.Provider.Id##
- ##Entity.Provider##
- ##Entity.Provider.DatabaseName##
- ##Entity.Id##
- ##Entity.ProviderWithoutDashes##
- ##Entity.DataDelay##
- ##Entity.SourcePath##
- ##Entity.RowDelimiter##
- ##Entity.NullText##
- ##Entity.FieldDelimiter##
After all these resolutions, the resulting JSON file is used to create the dataset in the Data Factory.
Pipeline and activities generation¶
For each activity template associated to the pipeline template of the pipeline, this is the order of resolution of parameters:
Round | Parameters source |
---|---|
1 | From the association between the pipeline template and the activity template |
2 | From the pipeline |
3 | Default values included in the activity template |
After all these resolutions have been carried out, the resulting JSON structure is included in the pipeline template and then these resolutions are carried out:
Round | Parameters source |
---|---|
1 | From the pipeline. This second time that the parameters from pipeline are resolved is because in the previous one they are only applied to dataset templates, now it includes the pipeline template. |
2 | Default values included in the pipeline template |
3 | Global placeholders that can be used in different types of templates |
4 | Configuration placeholders with their corresponding values. They are stored in the [Management].[Configuration] table in the Sidra database. |
5 | Entity placeholders. In case of there is a unique Entity, a list of Entity placeholders will be replaced by the respective values in that Entity. |
After all these resolutions, the resulting JSON structure is used to create the pipeline -which include the activities- in the Data Factory.
Triggers association¶
Pipelines can also be associated to several -or none- triggers. The implementation of this relationship in Data Factory is carried out by the trigger creation table. For this reason, the Data Factory Manager creates first the Pipelines and then the Triggers that will need the pipelines already created to implement the relationship.