Skip to content

Data Product tables

A great part of the tables in the Sidra Data Product are similar in structure and content to the ones stored in the Sidra Service. Please see the section related to Sidra Service metadata to check this. Even so, there is slight differences in certain tables or even new ones aimed at extracting the maximum functionality for the customer purposes in Data Products. Below, you can see a summary table with the existing tables for Data Products.

Data Products Tables Description Type
ActivityTemplate More information can be checked here in Sidra Service. Data Factory
ActivityTemplatePipelineTemplate Data Factory
Asset More information can be checked here in Sidra Service. Metadata
AssetStatus More information can be checked here in Sidra Service. Metadata
AssetTimeTravel This table is populated by Sidra with the information of Time Travel provided by Client Databricks when the data
of the Assets is inserted in Client Databricks. Performance
Attribute More information can be checked here in Sidra Service. Metadata
DataFactory More information can be checked here in Sidra Service. Data Factory
DataSetTemplate More information can be checked here in Sidra Service. Data Factory
DataSetTemplatePipelineTemplate More information can be checked here in Sidra Service. Data Factory
DataSource More information can be checked here in Sidra Service. Metadata
DataSourceTemplate More information can be checked here in Sidra Service. Metadata
DataSourceTemplatePipelineTemplate More information can be checked here in Sidra Service. Metadata
Entity More information can be checked here in Sidra Service. Metadata
EntityPipeline This table allows to associate the same pipeline with several Entities. Metadata
ExtractPipelineExecution Fill the tables with the data from Sidra Service, leaving information/activities for the stored procedure to execute them. Data extraction
ExtractPipelineExecutionStatus The ExtractPipelineExecutionStatus table is a reference table to enumerate the different statuses involved along a Data Product pipeline execution. Data extraction
LoadProcess Checks the status and reports about some possible error during the process. Data loading checking
LoadProcessTask Gives information about the LoadProcess during some new data ingestion and its destination. Data loading checking
LoadStatus Details the status adding a "Description" field Data loading checking.
LoadStatusEnum Contains the different status of the loading and their definition. Data loading checking
Pipeline More information can be checked here in Sidra Service. For Sidra Data Products, this table contains an ExecutionParameters field that, when filled, the pipeline will be executed by the Sync job. Data Factory
PipelineProvider Relation between pipeline and Provider. Metadata
PipelineSyncBehaviours For more information, please check this section. Sync
PipelineTemplate More information can be checked here in Sidra Service. Data Factory
Provider More information can be checked here in Sidra Service. Metadata
StagingConfiguration Contains the configuration of new table names in Staging that need to be created and the queries to create these new tables. Configuration
TableImport Gives the information about if the data transaction between the staging table and the Sidra Data Product has been properly done; if not, it raises an error. Data loading checking
Trigger More information can be checked here in Sidra Service. Data Factory
TriggerPipeline More information can be checked here in Sidra Service. Data Factory
TriggerTemplate More information can be checked here in Sidra Service. Data Factory
PipelineTemplateType More information can be checked here in Sidra Service. Data Factory
PipelineTemplateTypes Describes the kind of used pipelines. Data Factory
TabularMetadata Stores the name of tables, partition columns and views within the tabular model. Tabular model
TabularPartition Stores information about the tables partitions. Tabular model
Measure Stores information related with the measures used in the tabular model. Tabular
RoleMember Stores the roles by user. Tabular

AssetTimeTravel table

This table is populated by Sidra with the information of Time Travel provided by Client Databricks when the data of the Assets is inserted in Client Databricks.

Please, check Databricks documentation for more information on the Time Travel functionality.

Column Description
Id Id of the Asset where Time Travel definition applies.
IdStagingConfiguration Identifier of the staging configuration used to load this Asset.
VersionNumber Version number of the Time Travel given by Databricks.
Timestamp Timestamp of the Time Travel given by Databricks.

Step-by-step

How to handle dependant Entities through Databricks app

For a detailed example on how to use these tables and the general functioning of the PI Data Product, please check this tutorial .

EntityPipeline table

The same extraction pipeline can be used to extract several Assets from different Entities. That can be configured by associating the same pipeline to several Entities using the EntityPipeline table.

Column Description Required
IdPipeline Identifier of the pipeline Yes
IdEntity Identifier of the related Entity Yes
IsMandatory If true, Assets associated with the IdEntity need to be as a complete set to run the pipeline. If false, complete data calculation will not take into account this IdEntity. By default: true. Yes
PipelineExecutionProperties This a place to put a JSON that could be used in the pipeline for different purposes. By default, NULL, empty or "{}".

PipelineExecutionProperties column

The pipelineExecutionProperties column in the EntityPipeline table of Data Products is being used to add a ConsolidationMode as an Overwrite parameter which will take action before any data load. A Databricks notebook will be in charge of doing the following functions while checking the parameter status:

  • When ConsolidationMode is set up as Overwrite, the entire table will be overwritten.
  • When ConsolidationMode is set up as Merge (default), the data will be merged if there is a Primary Key, otherwise the data will be appended.
  • When ConsolidationMode is set up as Append, the data is appended.
  • When other value is set up in this parameter, it will throw an error stating that the ConsolidationMode XXX is not valid.
  • When the parameter does not exist or it is empty, it will take the default value of Merge.

Do not confuse with consolidation mode options on DSU.

ExtractPipelineExecution table

The ExtractPipelineExecution table is an execution tracking table that is only present in a Data Product.

Column Description
Id Id or the record
IdPipeline Id of the Data Product pipeline
PipelineRunId GUID with the Id of the pipeline execution in ADF
PipelineExecutionDate Timestamp of execution of the pipeline
IdEntity Id of the Entity
IdAsset Id of the Asset
AssetDate Business date of the Asset being ingested
AssetOrder An order number internally used in the stored procedures for loading the data
IdExtractPipelineExecutionStatus Id of the ExtractPipelineExecutionStatus for the status of the Asset during the pipeline load

If the Sync Behavior (see the Sync Behavior documentation) is LoadUpToLastValidDate, the Asset load status per pipeline will be checked from this table.

ExtractPipelineExecutionStatus table

The ExtractPipelineExecutionStatus table is a reference table to enumerate the different statuses involved along a Data Product pipeline execution.

Column Description
Id Id or the record
Name Name of the execution status
Description Description of the execution status

ExtractPipelineExecutionStatuses table

Status Id Status Name Status Description
0 Extracting Asset is being copied from the DSU into the Client Staging area
1 Staging Asset already copied into the Staging area
2 Finished Asset already copied into the final tables
3 Error Issue when copying the Asset
4 Removed Asset deleted

The statuses for this table are updated by some Data Product pipeline activities.

Load tables

Tables group designated to check the loadings and their proper operation during the process.

LoadProcess table

This table tracks and logs the details of data loading processes.

Column Description
Id LoadProcess identifier
LoadStartDate Date of start of the LoadProcess
LoadEndDate Date of end of the LoadProcess
IdLoadStatus Identifier of the LoadStatus, detailed in the LoadStatusEnum table
ErrorMessage Error message shown in the case that some error exists during the LoadProcess
PipelineRunId Pipeline name used
IdPipeline Pipeline identifier used

LoadProcessTask table

This table stores information about monitoring of individual tasks within a data loading process.

Column Description
Id Identifier of the task
IdLoadProcess LoadProcess identifier
DestinationTableName Table name in which the data will be inserted or will be updated
StartDate Date of start for the insertion
EndDate Date of end for the insertion
Error Boolean that shows if some error occurred
ErrorMessage Error message shown in the case that some error exists during the LoadProcessTask
RowsInserted Number of rows inserted
RowsUpdated Number of rows updated

LoadStatus table

Column Description
LoadStatusEnum Id Identifier of the status detailed in the LoadStatusEnum table
StatusName Name of the status
Description Details of the status

LoadStatusEnum table

Identifier Status
0 Error
1 Loading
2 Loaded
3 NoFiles

Pipeline table

The Pipeline table in Data Products has some additional fields compared to the fields that the Pipeline table has in Sidra Service:

  • ExecutionParameters: This is an additional field used for including values for the parameters of the Data Product pipeline when the pipeline is executed by the Sync job.

PipelineProvider table

An association between the Pipeline and Provider can be added using the PipelineProvider table. This association is used by the Sync job to execute only those pipelines that are associated to enabled Providers, i.e. those Providers with false in the IsDisabled field.

Column Description
IdPipeline Pipeline identifier
IdProvider Provider identifier

PipelineSyncBehaviour table

PipelineSyncBehaviour is a table that defines the different type of behaviour that is going to be executed from the Sync job perspective. Sync job is the responsible to synchronize metadata and execute the pipelines associated with the Entities.

Identifier Status Description
0 Ignored Pipeline will be ignored by the Sync job, so probably that pipeline should be executed explicitly using a Trigger
1 LoadAllNewDates The Sync job will provide all new dates for complete set of data not yet imported (with status 2 or 6) when the pipeline is executed
2 LoadFirstNewDate The Sync job will provide the older date for the complete set of data not yet imported (with status 2 or 6) when the pipeline is executed
3 LoadLastNewDate The Sync job will provide the most recent date for the complete set of data not yet imported (with status 2 or 6) when the pipeline is executed
4 LoadPendingUpToValidDate The Sync job will provide all Assets not previously loaded from dates since last execution up to last day with all mandatory Assets available
5 LoadUpToLastValidDate Default. The Sync job will provide all Assets not previously loaded by this same pipeline. Assets that failed to load will be included for retrying a limited number of times, and then discarded

The LoadUpToLastValidDate Sync behavior works similarly to type LoadPendingUpToValidDate, but checking the ExtractPipelineExecution to work with multiple pipelines. This is the mandatory Sync behavior for loading an Entity with several pipelines. This is also the recommended Sync behavior for all new load pipelines. This Sync behaviour will retry not finished Assets by pipeline for a number of times by default (3), this can be changed through AppService configuration setting up the Sync.AssetRetries key.

The behaviors LoadPendingUpToValidDate or LoadUpToLastValidDate are required in order to avoid situations like the following:

  • If there is an error in the Data Intake Process (Sidra Service ingestion, incremental loads) when some of the Entities are loaded but the failed Entities are not loaded. The overall status of the job is Succeeded, because this way we do not block the iterative batch process in the pipeline to load all Entities. When this issue happens, e.g. if there are a set of three Entities that are marked as Mandatory on the Data Product side (table EntityPipeline), the Sync job will not trigger the pipeline until all three Assets have been generated, which could take several days (e.g. three days, assuming that on day 3 we finally have the three mandatory Assets).

Sync behavior LoadAllNewDates is only capable of loading all completed date Assets on the date of execution of the Sync job, leaving the already loaded increments in Sidra Service without loading on the Data Product staging tables. For some business logic of Data Products, we may need to know exactly which days there were Assets that were not processed, in order to retrieve this missing data by the Data Product.

Sync behaviors LoadPendingUpToValidDate or LoadUpToLastValidDate loads any Asset BEFORE UP TO the most recent day in which there are available Assets for every mandatory Entity.

??? Info "Update the Sync behaviour of pipelines Update the Sync behaviour from the endpoint:

PUT /api/pipeline
Check here the available endpoints.

StagingConfiguration table

Data Products contain StagingConfiguration table, which contains the configuration of the new table names in Staging that need to be created, as well as the queries to create these new tables.

Column Description
IdPipeline Id of the pipeline for the extraction of the data from the DSU to the Staging tables (Mandatory).
Query Query that defines the way to retrieve the data from the Databricks of the Data Product. If it is not defined, an IdEntity must be specified to retrieve the content by a default query. IdEntity and Query cannot be defined both as NULL at the same time.
IdEntity Entity identifier to retrieve the data from the Databricks of the Data Product. By default, it retrieves all Attributes of the Entity. If it is not defined, a Query must be specified to know the way to retrieve the data. IdEntity and Query cannot be both NULL at the same time.
SchemaName Schema name used for the staging tables. By default, Staging schema is used.
TableName Table name for SQL staging table.
TableFieldDefinition Table field definition for the staging table, e.g., [SalesOrderID] [int] NULL,[OrderDate] [datetime] NULL,[SalesOrderNumber] [nvarchar](25) UnitPrice] [money] NULL. If it is not defined, the whole list of Attributes of the Entity is used. TableFieldDefinition and IdEntity cannot be defined as both NULL at the same time.

The StagingConfiguration table is needed in order to parametrize these queries in the Databricks notebook.

TableImport table

This table stores the name of tables imported from the Data Storage Unit.

Column Description
Id Primary key
SourceTable Name of the source table
DestinationSchema Name of the destination schema
DestinationTable Name of the source table
IdAsset Id of the imported Asset
LastUpdated Timestamp of the last time the asset was imported
ErrorMessage Error message in case an error occurred on import

PipelineTemplateType table

This table stores the template definition types for Azure Data Factory pipelines.

Column Description
Id Primary key
Name ExtractPipelineExecution status name
Description ExtractPipelineExecution status description

PipelineTemplateTypes table

This table defines all the pipelines templates types.

Identifier Type Description
0 System Pipelines used internally
1 Data Product Intake Pipelines used for data load processes
2 Other Other kind of pipelines

TabularMetadata table

This table stores the metadata related with the tabular model, defining the structure and relationship of the data within it.

Column Description
Id Primary key
DWTableName Name of the table in the data warehouse
DWPartitionColumn Partitioned column stored in the data warehouse
DWViewName View name
TabularModelTableName Table name from the tabular model

TabularPartition table

This table provides details about the partitions in the tabular model, as well as metadata like partitioning limits or update dates.

Column Description Type
Id Primary key int
FactTable FactTable name string
LeftBoundary Left limit of the partition range int
RightBoundary Right limit of the partition range int
PartitionNumber Partitions number int
LastUpdated Last date of update datetime
TabularProcessed Defines if it is processed or not boolean
CreationDate Date of creation datetime
TabularTableName Name of the tabular table string
PartitionName Name of the partition string

Measure table

This table contains information about measures used in the tabular model.

Column Description Type
Id Primary key int
Name Measure name string
Description Measure description string
TableName Name of the table string
Expression Measure expression in DAX string
DisplayFolder Folder that contains the measure string
FormatString Measure results format string
Priority Identifier of priority int
IdProvider Provider identifier int
Removed Defines if it is removed or not boolean

RoleMember table

Column Description Type
Id Primary key int
Role Role name string
User User name string