How to associate an Entity with a pipeline in Data Intakes¶
1. Data Intakes using Landing Zone¶
Data Factory pipelines can be defined in a JSON format. Using that JSON, they can be created programmatically using the Data Factory API.
There is a key component in Sidra, the Data Factory Manager, that composes the pipelines JSON from the templates and parameters stored in the Sidra Core metadata database and use it to create the pipeline in Data Factory.
In order to perform the parameter substitution in the templates, Data Factory Manager needs to know which pipeline to be launched for which Entity. This relationship is stored in the metadata database in the EntityPipeline table.
The way to create a new Entity-Pipeline relationship is using Sidra API endpoint to associate Entities and pipelines.
EntityPipeline general information¶
This is the information about an Entity-Pipeline relationship that must be included when it is added to the metadata database:
|[Required] Identifier of the pipeline
|[Required] Identifier of the related Entity
- All the Entities must be associated to the
FileIngestionDatabrickspipeline, since it will be used by all the Entities -no matter the method used to add their Assets to the platform- to ingest the raw copy of the Asset into Databricks.
- Pipelines have a Globally Unique Identifier (GUID) that is unique above all the Sidra installations. This GUID is stored in the Pipeline table in the column
ItemId. It is recommended to use the
ItemIdto get the
Idof the pipeline.
ItemIdof the FileIngestionDatabricks is
Add Entity-Pipeline relationship using Sidra API¶
Sidra API requires requests to be authenticated, the section How to use Sidra API explains how to create an authenticated requests. For the rest of the document, it is going to be supposed that Sidra API is deployed in the following URL:
Before creating an relationship between an Entity and a pipeline, it is required to know the Id of each of them.
Associate an Entity to a pipeline¶
The only required pieces if information are the Ids of the Entity and the pipeline, and include them in the URL of the request:
2. Data Intake Processes (connectors)¶
The EntityPipeline relationship for a DIP is created automatically during the execution of the Metadata Pipeline. This is used in the execution of the intake to select only the Entities included for that pipeline.