Skip to content

Data Product tables

A great part of the tables in the Sidra Data Product are similar in structure and content to the ones stored in the Sidra Service. Please see the section related to Sidra Service metadata to check this. Even so, there is slight differences in certain tables or even new ones aimed at extracting the maximum functionality for the customer purposes in Data Products. Below, you can see a summary table with the existing tables for Data Products.

Data Products Tables Description Type
ActivityTemplate More information can be checked here in Sidra Service. Data Factory
ActivityTemplatePipelineTemplate Data Factory
Asset More information can be checked here in Sidra Service. Metadata
AssetStatus More information can be checked here in Sidra Service. Metadata
AssetTimeTravel More information can be checked here Performance
Attribute More information can be checked here in Sidra Service. Metadata
DataFactory More information can be checked here in Sidra Service. Data Factory
DataSetTemplate More information can be checked here in Sidra Service. Data Factory
DataSetTemplatePipelineTemplate More information can be checked here in Sidra Service. Data Factory
DataSource More information can be checked here in Sidra Service. Metadata
DataSourceTemplate More information can be checked here in Sidra Service. Metadata
DataSourceTemplatePipelineTemplate More information can be checked here in Sidra Service. Metadata
Entity More information can be checked here in Sidra Service. Metadata
EntityPipeline This table allows to associate the same pipeline with several Entities. Metadata
EntityStagingMapping This table acts as a tracking of the relationship between the Entity and its corresponding staging table. Data mapping
ExtractPipelineExecution Fill the tables with the data from Sidra Service, leaving information/activities for the stored procedure to execute them. Data extraction
ExtractPipelineExecutionStatus The ExtractPipelineExecutionStatus table is a reference table to enumerate the different statuses involved along a Data Product pipeline execution. Data extraction
LoadProcess Checks the status and reports about some possible error during the process. Data loading checking
LoadProcessTask Gives information about the LoadProcess during some new data ingestion and its destination. Data loading checking
LoadStatus Details the status adding a "Description" field Data loading checking.
LoadStatusEnum Contains the different status of the loading and their definition. Data loading checking
Pipeline More information can be checked here in Sidra Service. For Sidra Data Products, this table contains an ExecutionParameters field that, when filled, the pipeline will be executed by the Sync webjob. Data Factory
PipelineProvider Relation between pipeline and Provider. Metadata
PipelineSyncBehaviours For more information, please check this section. Sync
PipelineTemplate More information can be checked here in Sidra Service. Data Factory
Provider More information can be checked here in Sidra Service. Metadata
StagingConfiguration More information can be checked here. Configuration
TableImport Gives the information about if the data transaction between the staging table and the Sidra Data Product has been properly done; if not, it raises an error. Data loading checking
Trigger More information can be checked here in Sidra Service. Data Factory
TriggerPipeline More information can be checked here in Sidra Service. Data Factory
TriggerTemplate More information can be checked here in Sidra Service. Data Factory

EntityStagingMapping table

Column Description
IdEntity Identifier of the Entity
StagingTableName Name of the staging table

EntityPipeline table

The same extraction pipeline can be used to extract several Assets from different Entities. That can be configured by associating the same pipeline to several Entities using the EntityPipeline table.

Column Description Required
IdPipeline Identifier of the pipeline
IdEntity Identifier of the related Entity
IsMandatory If true, Assets associated with the IdEntity need to be as a complete set to run the pipeline. If false, complete data calculation will not take into account this IdEntity. By default: true.
PipelineExecutionProperties This a place to put a JSON that could be used in the pipeline for different purposes. By default, NULL, empty or "{}".

PipelineExecutionProperties column

The pipelineExecutionProperties column in the EntityPipeline table of Data Products is being used to add a ConsolidationMode as an Overwrite parameter which will take action before any data load. A Databricks notebook will be in charge of doing the following functions while checking the parameter status:

  • When ConsolidationMode is set up as Overwrite, the entire table will be overwritten.
  • When ConsolidationMode is set up as Merge (default), the data will be merged if there is a Primary Key, otherwise the data will be appended.
  • When ConsolidationMode is set up as Append, the data is appended.
  • When other value is set up in this parameter, it will throw an error stating that the ConsolidationMode XXX is not valid.
  • When the parameter does not exist or it is empty, it will take the default value of Merge.

ExtractPipelineExecution table

The ExtractPipelineExecution table is an execution tracking table that is only present in a Data Product.

Column Description
Id Id or the record
IdPipeline Id of the Data Product pipeline
PipelineRunId GUID with the Id of the pipeline execution in ADF
PipelineExecutionDate Timestamp of execution of the pipeline
IdEntity Id of the Entity
IdAsset Id of the Asset
AssetDate Business date of the Asset being ingested
AssetOrder An order number internally used in the stored procedures for loading the data
IdExtractPipelineExecutionStatus Id of the ExtractPipelineExecutionStatus for the status of the Asset during the pipeline load

If the Sync Behavior (see the Sync Behavior documentation) is LoadUpToLastValidDate, the Asset load status per pipeline will be checked from this table.

ExtractPipelineExecutionStatus table

The ExtractPipelineExecutionStatus table is a reference table to enumerate the different statuses involved along a Data Product pipeline execution.

Column Description
Id Id or the record
Name Name of the execution status
Description Description of the execution status

ExtractPipelineExecutionStatuses table

Status Id Status Name Status Description
0 Extracting Asset is being copied from the DSU into the Client Staging area
1 Staging Asset already copied into the Staging area
2 Finished Asset already copied into the final tables
3 Error Issue when copying the Asset
4 Removed Asset deleted

The statuses for this table are updated by some Data Product pipeline activities.

Load tables

Tables group destinated to check the loadings and their proper operation during the process.

LoadProcess table

Column Description
Id LoadProcess identifier
LoadStartDate Date of start of the LoadProcess
LoadEndDate Date of end of the LoadProcess
IdLoadStatus Identifier of the LoadStatus, detailed in the LoadStatusEnum table
ErrorMessage Error message shown in the case that some error exists during the LoadProcess
PipelineRunId Pipeline name used
IdPipeline Pipeline identifier used

LoadProcessTask table

Column Description
Id Identifier of the task
IdLoadProcess LoadProcess identifier
DestinationTableName Table name in which the data will be inserted or will be updated
StartDate Date of start for the insertion
EndDate Date of end for the insertion
Error Boolean that shows if some error occurred
ErrorMessage Error message shown in the case that some error exists during the LoadProcessTask
RowsInserted Number of rows inserted
RowsUpdated Number of rows updated

LoadStatus table

Column Description
LoadStatusEnum Id Identifier of the status detailed in the LoadStatusEnum table
StatusName Name of the status
Description Details of the status

LoadStatusEnum table

Identifier Status
0 Error
1 Loading
2 Loaded
3 NoFiles

Pipeline table

The Pipeline table in Data Products has some additional fields compared to the fields that the Pipeline table has in Sidra Service:

  • ExecutionParameters: This is an additional field used for including values for the parameters of the Data Product pipeline when the pipeline is executed by the Sync webjob.

PipelineProvider table

An association between the Pipeline and Provider can be added using the PipelineProvider table. This association is used by the Sync webjob to execute only those pipelines that are associated to enabled Providers, i.e. those Providers with false in the IsDisabled field.

Column Description
IdPipeline Pipeline identifier
IdProvider Provider identifier

PipelineSyncBehaviours table

Identifier Status Description
0 Ignored Pipeline will be ignored by ignored (not executed)
1 LoadAllNewDates Sync will provide all new dates for complete set of data not yet imported (with status 2 or 6) when the pipeline is executed
2 LoadFirstNewDate Sync will provide the older date for the complete set of data not yet imported (with status 2 or 6) when the pipeline is executed
3 LoadLastNewDate Sync will provide the most recent date for the complete set of data not yet imported (with status 2 or 6) when the pipeline is executed
4 LoadPendingUpToValidDate Sync will provide all Assets not previously loaded from dates since last execution up to last day with all mandatory Assets available
5 LoadUpToLastValidDate Sync will provide all Assets not previously loaded by this same pipeline. Assets that failed to load will be included for retrying a limited number of times, and then discarded

For more information, check this section.

TableImport table

Column Description
Id Primary key
SourceTable Name of the source table
DestinationSchema Name of the destination schema
DestinationTable Name of the source table
IdAsset Id of the imported Asset
LastUpdated Timestamp of the last time the asset was imported
ErrorMessage Error message in case an error occurred on import

Last update: 2023-09-22