How to associate an entity with a pipeline

Data Factory pipelines can be defined in a JSON format. Using that JSON, they can be created programmatically using the Data Factory API. There is a key component in Sidra, the Data Factory Manager, that composes the pipelines JSON from the templates and parameters stored in the metadata database and use it to create the pipeline in Data Factory.

In order to perform the parameter substitution in the templates, Data Factory Manager needs to know which pipeline is related to which entity. This relationship is stored in the metadata database in the EntityPipeline table.

There are two ways to create a new entity pipeline relationship:

  • Create a SQL script to insert the relationship in the database.
  • Use the Sidra API endpoint to associate entities and pipelines.

EntityPipeline information

This is the information about an entity pipeline relationship that must be included when it is added to the metadata database:

Column Description
IdPipeline [Required] Identifier of the pipeline
IdEntity [Required] Identifier of the related entity

Considerations:

  • All the entities must be associated to the FileIngestionDatabricks pipeline, since it will be used by all the entities -no matter the method used to add their assets to the platform- to ingest the raw copy of the asset into Databricks.
  • Pipelines have a Globally Unique Identifier (GUID) that is unique above all the Sidra installations. This GUID is stored in the Pipeline table in the column ItemId. It is recommended to use the ItemId to get the Id of the pipeline.
  • The ItemId of the FileIngestionDatabricks is F8CD4AD9-DCAE-4463-9185-58292D14BE99.

Add entity pipeline relationship using a SQL script

Add a SQL script following the naming conventions to the Scripts\CoreContext folder or to the place configured in the DatabaseBuilder from where it retrieves the scripts. A sample of the script to add a relationship between an entity and a pipeline can be found below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
-- DECLARE
DECLARE @Id_Entity INT = 100 -- Id of the entity  
DECLARE @Id_Pipeline INT = 20 -- The Id of the pipeline. If it is not know, it should be retrieved using the ItemId of the pipeline

DECLARE @Id_Ingestion_Pipeline INT =
(
    SELECT [Id]
    FROM [DataIngestion].[Pipeline]
    WHERE [ItemId] = 'F8CD4AD9-DCAE-4463-9185-58292D14BE99'
)

-- ROLLBACK
DELETE FROM [DataIngestion].[EntityPipeline]
WHERE [IdEntity] = @Id_Entity AND [IdPipeline] IN (@Id_Pipeline, @Id_Ingestion_Pipeline)

-- SCRIPT
INSERT [DataIngestion].[EntityPipeline] 
     ([IdEntity],   [IdPipeline]) 
VALUES 
     (@Id_Entity,   @Id_Pipeline)
    ,(@Id_Entity,   @Id_Ingestion_Pipeline) 

Add entity pipeline relationship using Sidra API

Sidra API requires requests to be authenticated, the section How to use Sidra API explains how to create an authenticated requests. For the rest of the document, it is going to be supposed that Sidra API is deployed in the following URL:

1
https://core-mycompany-dev-wst-api.azurewebsites.net

Before creating an relationship between an entity and a pipeline, it is required to know the Id of each of them.

Associate an entity to a pipeline

It is only required the Ids of the entity and the pipeline and use include them in the URL of the request:

Request

1
POST https://core-mycompany-dev-wst-api.azurewebsites.net/api/metadata/entities/{idEntity}/pipelines/{idPipeline}?api-version=1.0

The response will return the Id of the attribute format created. A request must be done for each of the attribute format that want to be created, since, currently, the Sidra API does not support creating attribute formats in bulk.

Response

1
2
3
4
5
6
7
[{
        "idEntity": 100,
        "idPipeline": 20,
        "entityName": "MyEntityName",
        "pipelineName": "MyPipelineName"
    }
]