Add a new pipeline for a Client Application¶
Data Factory pipelines can be defined in a JSON format. Using that JSON, they can be created programmatically using the Data Factory API. There is a key component in Sidra, the Data Factory Manager for Client, that composes the pipelines JSON from the templates and parameters stored in the metadata database and use it to create the pipeline in Data Factory.
Steps to create a Client Application pipeline¶
The steps to create a Client Application pipeline are basically similar to those steps needed for Sidra Core and can be summarized in the following way:
- Choose the
PipelineSyncBehaviourrequired for the pipeline. Please refer to Client Applications pipeline sync behaviours for details on sync behaviour types.
- Get the Pipeline template id to use.
- Create the ItemId for the new pipeline to be created and store this as a variable to be used when creating the pipeline.
- Create an entry to
Pipelinetable with the selected
IdPipelineSyncBehaviour, providing information on
- Associate the Entities with the pipeline by adding the entity-pipeline relationship in the
EntityPipelinetable with the
IdPipelinecreated in the previous step and the
IdEntityof each one of the entities required for the pipeline workflow.
- Put the new created pipeline script inside the DatabaseBuilder Client Application project.
- Raise a deployment if it is not configured in the current environment so it is raised automatically. Database builder will execute the SQL script to put the content in the database to the artifacts in Data Factory.
Tutorial: adding a basic pipeline for extracting selected entities into staging.¶
Before starting this tutorial, Client Application pipelines should be associated -as well as the Core ones- to the
EntityPipeline table to be raised from the Sync webjob with the right configuration.
If the pipeline is not being raised by the Sync webjob, it should be raised with a Trigger, populating the tables
TriggerTemplate. Check this document for further information about trigger model, which works in the same way than in Core.
1. Get the pipeline template to use¶
ItemId of one of the templates described and prepare an SQL statement to get the actual
Id of the template.
This Id will be required for the pipeline creation.
See the following example:
1 2 3 4 5 6
2. Create the ItemId for the new pipeline¶
Create a GUID that will be used to define the ItemId of the new pipeline. This GUID could be created using Visual Studio, PowerShell or T-SQL.
For example, for T-SQL you could use the following line of code:
3. Store the ItemId as a variable to be used¶
Supposing that the previous sentence returns 6860B34B-A53F-42EF-8FD7-D2CAF5690CAF as a value, just store this resulting value in a variable:
4. Prepare the rollback section¶
To guarantee the idempotency of the script, it is required to delete at the beginning of the script all the changes that the script is going to perform.
In this case, the script will be performing the following activities:
- The script will insert a new pipeline based on the templates provided by Sidra.
- The script will associate the new pipeline with several Entities through the table
The script should then ensure the following:
- The pipeline that is going to be inserted does not exist.
- The relationship between the Entities does not exist.
To provide these checks, it is required to add the following statements:
In case more items are added, they should be removed in the same way as well.
5. Add the new pipeline¶
For adding the new pipeline, it is required to use following Ids:
ItemIdas a unique id for the pipeline in the system. This is the value calculated in step 3 as
IdTemplateis the id of the template used, calculated in the step 1 as
IdDataFactoryis the Data Factory used to deploy this pipeline. This can be found in
[DataIngestion].[DataFactory]table and normally in the common Client Application there is only one; so the value
1will be used assuming that Id exists in the related table.
IdPipelineSyncBehaviour: This was described in this section and by default, a value of
1will be assumed as the standard behaviour.
ExecutionParameters: The pipeline template used provides parameters to scale the database and to execute a stored procedure. Therefore these parameters should be provided:
1 2 3 4 5
1 2 3 4 5 6 7
6. Add Entity-pipeline relationship using a SQL script¶
In order to extract the data from the desired Entities in the DSU, it is required to associate the pipeline with them. In the following example let's assume for this pipeline that the Entities with Id 1,3 and 5 are the Entities required to extract data from.
First thing is to get the Id of the pipeline created:
1 2 3 4 5 6
Second, insert the relationship between the
Entity and the
1 2 3 4 5
7. Complete the client script¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
8. Put your pipeline inside DatabaseBuilder¶
Add an SQL script following the naming conventions to the
Scripts\ClientContext folder or to the place configured in the DatabaseBuilder from which to retrieve the scripts.
Ensure that the script is working without errors and marked with "CopyAlways" to ensure that will be part of the result of the code compilation.
9. Push the changes into the branch¶
Raise a deployment if it is not configured in the current environment to be raised automatically and wait until the result.
Database builder job will execute the SQL script to put the content in the Client Application database and then
DataFactoryManager for client will update the content in the Client Applications database to the artifacts in the client Data Factory.
pt to put the content in the database and then
DataFactoryManager for client will update the content in the database to the artifacts in Data Factory.