Skip to content

Sidra Client Applications main concepts

The Client Applications are the pieces of Sidra enabling to drive business cases. The Client Applications layer can be also described as a transformation and serving layer for Sidra Data Platform.

Client App overview

Some characteristics of the Client Apps are:

  • Client Applications (or Apps) are a set of Azure resources and code, enclosed in a Resource Group, which either access the data from one or multiple Data Storage Units via the secure APIs, or retrieve the data from the Data Lake, applying business transformations if needed.

  • Any actor that needs to access the data stored in the Data Lake for a specific business need is catalogued as a Client Application.

  • Client Apps consume the content from the Data Storage Units, it is mandatory then to know if new content has been ingested in order to extract it and incorporate to the Client Application storage.

  • The architecture allows for Client Applications to be built using any set of tools and components, so they can range from Power BI-enabled analytical workspaces to even full-blown Web apps.

  • All Client Applications must use the same security model, notifications and logging infrastructure, etc., via Sidra APIs.

  • This centralization and templatization of governance and transversal components is what makes Sidra Data Platform an accelerator for innovation on serving data use cases.

Note

In some places the terminology Consumer apps to refer to Client Applications may still be found. This term is now deprecated.

Overview of the Client Application creation and deployment processes

Sidra Client Application pipelines are designed to perform installation and deployment of the Client Applications in a highly automated way, following a Continuous Integration/Continuous Deployment (CI/CD) process.

The general process is as follows:

  • Step 1: We need to download locally the corresponding Client Application template.

  • Step 2: The source for the actual instance of the Client Application needs to be created from this template.

  • Step 3: Once the .NET solution has been obtained with its code projects, we need to push this code to a specific branch (depending on the environment) on an Azure DevOps repository. The repository and branches need to have been created previously if they did not exist yet.

  • Step 4: Finally, the solution will be deployed with CI/CD pipelines.
    1. For this, the Client Application generates a build+release definition in YAML format.
    2. This YAML definition will be used by Azure DevOps to configure the integration and deployment processes.
    3. A set of artifacts will be required for this, that may vary according to the type of Client Application template that is used. This artifacts contain mainly required parameters for creating and deploying the infrastructure of the Client Application in Azure. There are mainly two alternatives:
      1. Data configuration files (.psd1)
      2. Variable groups in DevOps, the recommended option for new templates creation, as this is compatible with the plugins approach for Client Applications.

Info

Data movement and orchestration in Client Applications

The Client Applications solutions use Azure Data Factory V2 (ADF) to perform data movements -same as Sidra Core solutions-, but the deployment project creates its own instance of ADF in the Resource Group of the Client Application.

When working with both solutions at the same time -Core and Client Applications- it is important to differentiate the ADF instances. More specifically, there will be:

  • an ADF instance for each Data Storage Unit (DSU) in Core
  • an ADF instance for each Client Application solution

1. Job DataFactoryManager for Client Applications

ADF components (datasets, triggers and pipelines) in Client Applications are managed the same way than in Core, by means of the DataFactoryManager webjob.

The section Data Factory tables explains how it works in Core.

The DataFactoryManager job uses information stored in the metadata database to build and programmatically create the ADF components in Data Factory, which means that Client Applications need a metadata database to store the information about ADF components.

There are some minor differences between DataFactoryManager for Client Applications and for Core:

  • Core version includes the creation of the landing zones -creation of the Azure Storage containers- for the Data Storage Units.
  • The Pipeline table in Core can store ADF pipelines but also Azure Search pipelines. In Client Applications the Pipeline table only stores ADF pipelines.
  • DataFactoryManager in Client Applications has to filter the pipelines to create only those needed for ADF.

2. Understanding the synchronization between Sidra Core and Client Applications

Sidra provides templates to create Client Applications that can be configured to automatically retrieve the information from a Data Storage Unit (DSU) when it is available.

The process of discovering and extracting this information is called synchronization between a Client Application and Sidra Core.

Sidra provides a component that orchestrates this synchronization, the Sync webjob.

This job is deployed in all Client Applications with the template provided by Sidra. Without any additional configuration, the metadata in the Core database is already synchronized with the metadata in the Client Application database.

The synchronization will take place as long as the Client Application has the right permissions to synchronize with the data in the DSU. This is done by editing the Balea Authorization permissions tables (Users >Client Application Subject > Permissions).

The synchronization webjob is configured to be executed every 2 minutes.

3. Discovering new content in the Data Storage Unit

Whenever a content is ingested in the Data Lake inside a Data Storage Unit (DSU), a new Asset is created in the metadata database in Core. The Asset is associated to the rest of metadata stored for the content: Entity , Provider , Attributes...

The Client Application keeps a synchronized copy of the metadata to discover the ingestion of new Assets in the Data Lake.

That means that the Client Application contains a metadata database -similar to the one in Core. In the Client Application database, the metadata tables for Sidra are under the schema Sidra.

These are some of the most important tables used in the Client Application metadata database:

  • DataFactory: this table stores the Azure Data Factory resource group for the Client Application
  • Provider
  • Entity
  • Attribute
  • EntityPipeline: to store the Entity-Pipeline associations.
  • AttributesFormat
  • AssetStatus
  • Assets
  • Trigger
  • TriggerPipeline
  • TriggerTemplate
  • Pipeline
  • PipelineTemplate
  • PipelineSyncBehavior: this is described in this documentation page.

Even if we say that the metadata of the Core and the Client Application databases are synchronized, there are several differences between the metadata in Core and the metadata in Client Applications that are worth clarifying:

  • Some fields have been removed from the Client Application metadata tables because they are not used, for example the fields used for access control like SecurityPath and ParentSecurityPath.
  • Some fields have been added to the Client Application metadata tables. This is the case of the field IsDisabled, which has been added to Entity and Provider for disabling the synchronization for that particular Entity or Provider.
  • The state transitions in Core and in Client Applications are different. Therefore, AssetStatus table contains different states and workflows than in Sidra Core. For example, once the ingestion of the Asset is finished in Core, the status will be MovedToDataLake in Core, but in the Client Application the status will continue evolving until ImportedFromDataLake.

Note

You can check the Asset metadata section, which explains the above metadata tables, and all those status values and the transitions between them.

4. Role of Sync and DatabaseBuilder webjobs in Client Applications

The Sync web job uses Sidra API to retrieve latest information from the metadata tables. The metadata returned by the API is conditioned/limited by the permissions granted to the Client Application.

Based on the metadata received from the API, for each metadata table, the webjob updates its Client Application metadata tables. For Provider and Entity tables, any entry that is no longer available in Core is set as disabled in Client using the abovementioned IsDisabled field.

All this information will be used to decide if there is new content in the Data Lake to be imported to the Client Application.

In addition to this, the Sync webjob will also be responsible for executing the defined pipeline, depending on the sync behavior defined for each pipeline (see PipelineSyncBehavior), described here.

Sidra Core has the concept of dummy Assets, which are Assets of zero length that get created when an incremental load in Core finishes but results in no new increment of data. This concept was introduced in order to force the presence of a new Asset in the Client Application metadata tables. Without these Assets, the data synchronization would not be generated if the Assets are configured as mandatory Assets (see below information on this point). If these Assets are mandatory but not generated, the data movement would not happen and this could affect the business logic on the Client Application. Since Sidra version 1.10 the generation of dummy Assets in Data ingestion pipelines is optional, with default set to false. Therefore, if the data processing logic of a Client Application needs to have these Assets generated, please ensure that this parameter has the correct setting to true when deploying data intake pipelines.

DatabaseBuilder webjob

The population of the metadata database will be performed by the DatabaseBuilder webjob. The project included in the Client Application solution is already configured for this purpose:

static void Main(string[] args)
{
    ...

    // Add log configuration as well if required
    JobExecutor.Execute(
        configuration,
        new Options()
        {
            CreateClientDatabase = true,
            CreateLogDatabase = true
        },
        loggingBuilderAction);

    ...
}

This job will create the database schema specific for the Client Applications, with the differences explained in the sections above.

It also will be used to include in the database the information of the ADF components (pipeline, dataset, triggers) by using SQL scripts. This section explains how to execute SQL scripts using DatabaseBuilder webjob.

5. Extracting the new content from the Data Storage Unit

The Sidra Client Applications will use ADF for the data movement, in particular they will use pipelines for the extraction of new content from the Data Lake (more specifically, from the DSU).

The actions performed by the extraction pipelines will depend on what is going to be done with the content after the extraction. This logic is Client Application-specific and tied to business rules transformations. Some examples of the actions possibly executed by the extraction pipelines are:

  • The content may be used to populate a DataWarehouse inside the Client Application. In this case, such content will be first stored into the staging tables in the Client Application after the extraction.

  • The content will be optionally transformed through the execution of data transformations and business rules within the Client Application.

  • Optionally, this transformed data can be re-ingested as rich data back into the Data Lake. In this case, after the extraction and transformation the new content will be pushed to the landing zone in Core for the re-ingestion, as any new data Provider for Sidra is configured.

In consequence, there is not a single design for the Client Applications extraction pipeline: it will depend on the specifics of the business case implemented by the Client Application.

Extraction pipelines configuration

The extraction pipeline execution is launched by the Sync webjob based on the information in the Client Application metadata database. Such information includes metadata synchronized from Core, but also the configuration of the data extraction pipelines.

The configuration of the data extraction pipelines from the DSU is done in the metadata database tables, in a very similar way to how the ingestion and extraction pipelines are configured in Sidra Core.

Once a specific extraction pipeline is designed, it can be created and configured by setting it up in the database and allowing the DataFactoryManager to create the actual pipeline infrastructure it in ADF.

The basic configuration of the pipeline follows the same steps that the data intake pipelines configuration for Core. Data intake pipelines configuration is is described in this section: How to configure a new pipeline based on templates.

Also, please check the following documents:

6. Particularities on Client Applications extraction pipelines

The extraction pipelines in Client Applications have some particularities, compared to the pipelines in Sidra Core:

Pipeline type

The Pipeline table in Client Applications has some additional fields compared to the fields that the Pipeline table has in Sidra Core:

  • ExecutionParameters: This is an additional field used for including values for the parameters of the Client Application pipeline when the pipeline is executed by the Sync webjob.

Mandatory Assets

The same data extraction pipeline can be used to extract several Assets from different Entities. As in Sidra Core, EntityPipeline table is used to associate the same Client Application pipeline to several Entities.

For the Sync process to copy the data from the DSU it will be necessary to assign all Entities to their corresponding pipelines. For this, it will be required to insert in the EntityPipeline table in the Client App Sidra database, all the relationships between the Entities and the Client Applicatin data extraction pipeline to use.

Sidra also provides support for defining mandatory Assets, which means that the extraction pipeline will be executed only if all the Assets marked as Mandatory are present in Sidra Core, so there is an Asset at least for each of the mendatory Entities. We can mark certain Entities as optional (IsMandatory=0), for those Entities for which it is not needed that Assets are loaded in Sidra Core.

The mandatory Assets can be configured using the field IsMandatory, which is included in the EntityPipeline table and must be taken into account when setting up this relation. Please refer to the tutorial How to associate an Entity with a pipeline.

Association with Providers

An association between the Pipeline and Provider can be added using the PipelineProvider table. This association is used by the Sync webjob to execute only those pipelines that are associated to enabled Providers, i.e. those Providers with false in the IsDisabled field.

Extraction pipelines configuration

Once the specific extraction pipeline is configured, it can be created by setting up the appropriate information in the metadata database and allowing the DataFactoryManager job to create this pipeline in ADF.

The basic configuration of the pipeline in Sidra Core is described in section How to configure a new pipeline based on templates.

However, in Client Applications, some additional configuration is required. In summary:

  • An additional field IsMandatory is included in the association between Pipeline and Entity and must be taken into account when setting up this relation using the tutorial How to associate an Entity with a pipeline.
  • Pipelines in Client Applications has an additional field - ExecutionParameters-, which must be configured.

ExtractPipelineExecution table

The ExtractPipelineExecution table is an execution tracking table that is only present in a Client Application.

Column Description
Id Id or the record
IdPipeline Id of the Client Application pipeline
PipelineRunId GUID with the RunId of the pipeline execution in ADF
PipelineExecutionDate Timestamp of execution of the pipeline
IdEntity Id of the Entity
IdAsset Id of the Asset
AssetDate Business date of the Asset being ingested
AssetOrder An order number internally used in the stored procedures for loading the data
IdExtractPipelineExecutionStatus Id of the ExtractPipelineExecutionStatus for the status of the Asset during the pipeline load

If the Sync Behavior (see the Sync Behavior documentation) is LoadUpToLastValidDate, the Asset load status per pipeline will be checked from this table.

ExtractPipelineExecutionStatus

The ExtractPipelineExecutionStatus is a reference table to enumerate the different statuses involved along a Client Application pipeline execution.

Column Description
Id Id or the record
Name Name of the execution status
Description Description of the execution status

And these are the set of status:

Status id Status Name Status Description
0 Extracting Asset is being copied from the DSU into the Client Staging tables
1 Staging Asset already copied into the Client Staging tables
2 Finished Asset already copined into the final client tables
3 Error Issue when copying the Asset
4 Removed Asset has been deleted

The status for this table are updated by some Client Application pipeline activities.

How naming conventions for Client Application staging tables work

Sidra supports two different naming conventions for the Databricks and the Client Application staging tables:

  • The Sidra default naming convention. This naming convention is "databasename_schemaname_tablename". So, for example: for a table whose name is "table1" in the origin system, which is under a database "databaseA", and under a schema "schemaA", then the resulting name with this convention will be: "databaseA_schemaA_table1".
  • A custom naming convention through a couple of configuration parameters.

Please find below instructions on how to use these configuration parameters to specify this custom naming convention.

Steps of configuration for naming convention

Depending on the metadata scenario, different options of configuration are possible, as show below:

  1. For existing metadata that do not need to be updated (nor new metadata added): no action is required.

  2. For existing metadata that need to be updated or new metadata added, wanting to maintain the current names (so, not use the Sidra default naming convention): you need to enable the new Sidra custom naming convention.

    For this, you need to set the parameter EnableCustomTableNamePrefix to true, and also set the parameter CustomTableNamePrefix accordingly, with the prefix you want to use. The prefix is anything in the name that goes before the name of the table. For example:

    1. If your staging table name is "tableA", the prefix needs to be null.
    2. If your staging table is "schema_name.table_name_", then the prefix needs to be set to the placeholder "{SCHEMA_NAME}_".
    3. If your staging table is "database_name.table_name", then the prefix needs to be set to the placeholder "{DATABASE_NAME}_".

    Examples of use

    Independently of the value set to the parameter CustomTableNamePrefix, the names will be the new default naming convention provided by Sidra: "databasename_schemaname_tablename".

    Then you can use either a fixed value for the naming convention, or a placeholder:

    1. If using a fixed value to concatenate fixed prefixes to the actual table name, then you need to populate the field CustomTableNamePrefix with a string value. This will concat the prefix configured in customTableNamePrefix to the actual name of the table. For example:

      • If CustomTableNamePrefix is empty, the final name for the staging table of a table with name "table1" in origin, will be "table1".
      • If CustomTableNamePrefix="databaseA_", then the final name for the staging table of a table with name "table1" in origin, will be "databaseA_table1".
      • If CustomTableNamePrefix="schemaA_", then the final name for the staging table of a table with name "table1" in origin, will be "schemaA_table1".
    2. If using a placeholder value to concatenate a dynamic prefix to the actual table name, then you need to populate the field CustomTableNamePrefix with a placeholder string value. For example:

      • If CustomTableNamePrefix="{SCHEMA_NAME}_", then the final name for the staging table of a table in origin with name "table1, under schema "schemaA" will be: "schemaA_table1".
      • If CustomTableNamePrefix="{DATABASE_NAME}_", then the final name for the staging table of a table in origin with name "table" under the database "databaseA" will be: "databaseA_table1".
  3. For existing metadata that just need to be updated, you can also use the new Sidra default naming convention: "databasename_schemaname_tablename". In this case, some manual intervention is required:

    1. You need to update the stored procedures at the Client Application side, that reference these staging tables, as their names will have changed.
    2. You also need to consolidate Databricks tables (new Databricks tables will have been created with this naming convention after the naming convention change is applied; this means otherwise you would have different tables referencing the same origin data).

Identity Server token configuration

In order to copy the data from the Entities in the DSU, the Client Application needs to request a token to the Identity Server service. This token will only be valid for a restricted time frame. If the validity period of such token needs to be extended, the Identity Server database in the Core resource group allows to configure such setting. In order to do this, we need to increase the validity period of the tokens by doing the following:

Apply an UPDATE over the table [dbo].[Clients], extendign the value of the field [AccessTokenLifetime] for the respective Client Application. For example, to extend to 5 hours:

UPDATE [dbo].[Clients] SET [AccessTokenLifetime] = 18000 WHERE [ClientName] = 'ClientAppName'

where ClientAppName is the ClientName of the corresponding Client Application.


Last update: 2022-05-09
Back to top