Skip to content

Sidra Data Products metadata

Whenever a content is ingested in the Data Lake inside a Data Storage Unit (DSU), a new Asset is created in the metadata database in Sidra Service. The Asset is associated to the rest of metadata stored for the content: Entity , Provider ...

The Data Product keeps a synchronized copy of the metadata to discover the ingestion of new Assets in the Data Lake.

That means that the Data Product contains a metadata database -similar to the one in Sidra Service. In the Data Product database, the metadata tables for Sidra are under the schema Sidra.

These are some of the most important tables used in the Data Product metadata database:

  • DataFactory: this table stores the Azure Data Factory resource group for the Data Product
  • Provider
  • Entity
  • Attribute
  • EntityPipeline: to store the Entity-Pipeline associations.
  • AttributesFormat
  • AssetStatus
  • Assets
  • Trigger
  • TriggerPipeline
  • TriggerTemplate
  • Pipeline
  • PipelineTemplate
  • PipelineSyncBehavior: this is described in this documentation page.

Even if we say that the metadata of the Sidra Service and the Data Product databases are synchronized, there are several differences between the metadata in Sidra Service and the metadata in Data Products that are worth clarifying:

  • Some fields have been removed from the Data Product metadata tables because they are not used, for example the fields used for access control like SecurityPath and ParentSecurityPath.
  • Some fields have been added to the Data Product metadata tables. This is the case of the field IsDisabled, which has been added to Entity and Provider for disabling the synchronization for that particular Entity or Provider.
  • The state transitions in Sidra Service and in Data Products are different. Therefore, AssetStatus table contains different states and workflows than in Sidra Service. For example, once the ingestion of the Asset is finished in Sidra Service, the status will be MovedToDataLake in Sidra Service, but in the Data Product the status will continue evolving until ImportedFromDataLake.

You can check the Metadata section section, which explains the above metadata tables, and all those status values and the transitions between them.