A great part of the tables in the Sidra Data Product are similar in structure and content to the ones stored in the Sidra Service. Please see the section related to Sidra Service metadata to check this. Even so, there is slight differences in certain tables or even new ones aimed at extracting the maximum functionality for the customer purposes in Data Products. Below, you can see a summary table with the existing tables for Data Products.
Data Products Tables
Description
Type
ActivityTemplate
More information can be checked here in Sidra Service.
Data Factory
ActivityTemplatePipelineTemplate
Data Factory
Asset
More information can be checked here in Sidra Service.
Metadata
AssetStatus
More information can be checked here in Sidra Service.
More information can be checked here in Sidra Service. For Sidra Data Products, this table contains an ExecutionParameters field that, when filled, the pipeline will be executed by the Sync webjob.
Gives the information about if the data transaction between the staging table and the Sidra Data Product has been properly done; if not, it raises an error.
Data loading checking
Trigger
More information can be checked here in Sidra Service.
Data Factory
TriggerPipeline
More information can be checked here in Sidra Service.
Data Factory
TriggerTemplate
More information can be checked here in Sidra Service.
The same extraction pipeline can be used to extract several Assets from different Entities. That can be configured by associating the same pipeline to several Entities using the EntityPipeline table.
Column
Description
Required
IdPipeline
Identifier of the pipeline
IdEntity
Identifier of the related Entity
IsMandatory
If true, Assets associated with the IdEntity need to be as a complete set to run the pipeline. If false, complete data calculation will not take into account this IdEntity. By default: true.
PipelineExecutionProperties
This a place to put a JSON that could be used in the pipeline for different purposes. By default, NULL, empty or "{}".
The pipelineExecutionProperties column in the EntityPipeline table of Data Products is being used to add a ConsolidationMode as an Overwrite parameter which will take action before any data load. A Databricks notebook will be in charge of doing the following functions while checking the parameter status:
When ConsolidationMode is set up as Overwrite, the entire table will be overwritten.
When ConsolidationMode is set up as Merge (default), the data will be merged if there is a Primary Key, otherwise the data will be appended.
When ConsolidationMode is set up as Append, the data is appended.
When other value is set up in this parameter, it will throw an error stating that the ConsolidationMode XXX is not valid.
When the parameter does not exist or it is empty, it will take the default value of Merge.
The ExtractPipelineExecution table is an execution tracking table that is only present in a Data Product.
Column
Description
Id
Id or the record
IdPipeline
Id of the Data Product pipeline
PipelineRunId
GUID with the Id of the pipeline execution in ADF
PipelineExecutionDate
Timestamp of execution of the pipeline
IdEntity
Id of the Entity
IdAsset
Id of the Asset
AssetDate
Business date of the Asset being ingested
AssetOrder
An order number internally used in the stored procedures for loading the data
IdExtractPipelineExecutionStatus
Id of the ExtractPipelineExecutionStatus for the status of the Asset during the pipeline load
If the Sync Behavior (see the Sync Behavior documentation) is LoadUpToLastValidDate, the Asset load status per pipeline will be checked from this table.
The Pipeline table in Data Products has some additional fields compared to the fields that the Pipeline table has in Sidra Service:
ExecutionParameters: This is an additional field used for including values for the parameters of the Data Product pipeline when the pipeline is executed by the Sync webjob.
An association between the Pipeline and Provider can be added using the PipelineProvider table. This association is used by the Sync webjob to execute only those pipelines that are associated to enabled Providers, i.e. those Providers with false in the IsDisabled field.
Pipeline will be ignored by ignored (not executed)
1
LoadAllNewDates
Sync will provide all new dates for complete set of data not yet imported (with status 2 or 6) when the pipeline is executed
2
LoadFirstNewDate
Sync will provide the older date for the complete set of data not yet imported (with status 2 or 6) when the pipeline is executed
3
LoadLastNewDate
Sync will provide the most recent date for the complete set of data not yet imported (with status 2 or 6) when the pipeline is executed
4
LoadPendingUpToValidDate
Sync will provide all Assets not previously loaded from dates since last execution up to last day with all mandatory Assets available
5
LoadUpToLastValidDate
Sync will provide all Assets not previously loaded by this same pipeline. Assets that failed to load will be included for retrying a limited number of times, and then discarded