Skip to content

Sidra Client Applications metadata

This page is intended to explain the metadata concepts involved in Client Applications.

Data Storage Unit

Whenever a content is ingested in the Data Lake inside a Data Storage Unit (DSU), a new Asset is created in the metadata database in Core. The Asset is associated to the rest of metadata stored for the content: Entity , Provider , Attributes...

The Client Application keeps a synchronized copy of the metadata to discover the ingestion of new Assets in the Data Lake.

That means that the Client Application contains a metadata database -similar to the one in Core. In the Client Application database, the metadata tables for Sidra are under the schema Sidra.

These are some of the most important tables used in the Client Application metadata database:

  • DataFactory: this table stores the Azure Data Factory resource group for the Client Application
  • Provider
  • Entity
  • Attribute
  • EntityPipeline: to store the Entity-Pipeline associations.
  • AttributesFormat
  • AssetStatus
  • Assets
  • Trigger
  • TriggerPipeline
  • TriggerTemplate
  • Pipeline
  • PipelineTemplate
  • PipelineSyncBehavior: this is described in this documentation page.

Even if we say that the metadata of the Core and the Client Application databases are synchronized, there are several differences between the metadata in Core and the metadata in Client Applications that are worth clarifying:

  • Some fields have been removed from the Client Application metadata tables because they are not used, for example the fields used for access control like SecurityPath and ParentSecurityPath.
  • Some fields have been added to the Client Application metadata tables. This is the case of the field IsDisabled, which has been added to Entity and Provider for disabling the synchronization for that particular Entity or Provider.
  • The state transitions in Core and in Client Applications are different. Therefore, AssetStatus table contains different states and workflows than in Sidra Core. For example, once the ingestion of the Asset is finished in Core, the status will be MovedToDataLake in Core, but in the Client Application the status will continue evolving until ImportedFromDataLake.

You can check the Metadata section section, which explains the above metadata tables, and all those status values and the transitions between them.

Role of Sync and DatabaseBuilder webjobs in Client Applications

Sync webjob

The Sync web job uses Sidra API to retrieve latest information from the metadata tables. The metadata returned by the API is conditioned/limited by the permissions granted to the Client Application.

Based on the metadata received from the API, for each metadata table, the webjob updates its Client Application metadata tables. For Provider and Entity tables, any entry that is no longer available in Core is set as disabled in Client using the abovementioned IsDisabled field.

All this information will be used to decide if there is new content in the Data Lake to be imported to the Client Application.

In addition to this, the Sync webjob will also be responsible for executing the defined pipeline, depending on the sync behavior defined for each pipeline (see PipelineSyncBehavior), described here.

Note

Sidra Core has the concept of dummy Assets, which are Assets of zero length that get created when an incremental load in Core finishes but results in no new increment of data. This concept was introduced in order to force the presence of a new Asset in the Client Application metadata tables. Without these Assets, the data synchronization would not be generated if the Assets are configured as mandatory Assets (see below information on this point). If these Assets are mandatory but not generated, the data movement would not happen and this could affect the business logic on the Client Application. Since Sidra version 1.10 the generation of dummy Assets in Data ingestion pipelines is optional, with default set to false. Therefore, if the data processing logic of a Client Application needs to have these Assets generated, please ensure that this parameter has the correct setting to true when deploying data intake pipelines.

DatabaseBuilder webjob

The population of the metadata database will be performed by the DatabaseBuilder webjob. The project included in the Client Application solution is already configured for this purpose:

static void Main(string[] args)
{
    ...

    // Add log configuration as well if required
    JobExecutor.Execute(
        configuration,
        new Options()
        {
            CreateClientDatabase = true,
            CreateLogDatabase = true
        },
        loggingBuilderAction);

    ...
}

This job will create the database schema specific for the Client Applications, with the differences explained in the sections above.

It also will be used to include in the database the information of the ADF components (pipeline, dataset, triggers) by using SQL scripts. This section explains how to execute SQL scripts using DatabaseBuilder webjob.

Step-by-step

How to execute SQL scripts using DatabaseBuilder webjob

This section explains how to execute SQL scripts using DatabaseBuilder webjob .

Extracting the new content from the Data Storage Unit

The Sidra Client Applications will use ADF for the data movement, in particular they will use pipelines for the extraction of new content from the Data Lake (more specifically, from the DSU).

The actions performed by the extraction pipelines will depend on what is going to be done with the content after the extraction. This logic is Client Application-specific and tied to business rules transformations. Some examples of the actions possibly executed by the extraction pipelines are:

  • The content may be used to populate a DataWarehouse inside the Client Application. In this case, such content will be first stored into the staging tables in the Client Application after the extraction.

  • The content will be optionally transformed through the execution of data transformations and business rules within the Client Application.

  • Optionally, this transformed data can be re-ingested as rich data back into the Data Lake. In this case, after the extraction and transformation the new content will be pushed to the landing zone in Core for the re-ingestion, as any new data Provider for Sidra is configured.

In consequence, there is not a single design for the Client Applications extraction pipeline: it will depend on the specifics of the business case implemented by the Client Application.

More information about Client Application pipelines can be checked here.


Sidra Ideas Portal


Last update: 2022-11-10
Back to top