Sidra Client Applications main concepts

The Client Applications are the pieces of Sidra that enable to drive business cases. Client Applications (or Apps) are a set of Azure resources and code, enclosed in a Resource Group, which either access the data from one or multiple Data Storage Units via the secure APIs, or retrieve the data from the Data Lake, applying business transformations if needed.

Any actor that needs to access the data stored in the Data Lake for a specific business need is catalogued as a Client Application. Client Apps consume the content from the DSU(s), so it is important to know if new content has been ingested in order to extract it and incorporate to the Client Application storage.

The architecture allows for Client Applications to be built using any set of tools and components, so they can range from Power BI-enabled analytical workspaces to even full-blown Web apps. Having said that, all Client Applications have to use the same security model, notifications and logging infrastructure, etc, via Sidra APIs.

In some places the terminology Consumer apps can be found, it is now deprecated but was originally the term for Client Apps.

Client Application Solution templates

The flexibility of the architecture using Client Apps allows every of them to be completely different from the rest. Despite of that, it is common that many Client Applications follow the same structure. Sidra provides some Visual Studio templates to create the basic solution for a Client Application:

  • Client Application template, available in the NuGet package PlainConcepts.Sidra.DotNetCore.ConsumerAppTemplate.
  • Client Application with PowerBI template, available in the NuGet package PlainConcepts.Sidra.DotNetCore.ConsumerAppPowerBITemplate.

More information about how to use those templates can be found in the section Create project solution. From now on in this document, the Client App solutions will reference the Visual Studio solutions generated from those templates.

Client Applications Solution structure

If the solution created by the Client Application template is opened in Visual Studio, the content showed in the Solution Explorer should look similar to this:

client-solution-structure

These are the components included in the solution:

  • A deployment project named <<Solution name>>.Deployment, -e.g. Sidra.Dev.Client.Deployment-. This is the same project that can be seen in the solutions for Sidra Core.
  • An Extract custom activity project named <<Solution name>>.DataFactory.CustomActivity.Extract. This custom activity is used to extract the information of an Entity from the DSU.
  • A Sync webjob project named <<Solution name>>.DataFactory.Sync. This webjob is used to synchronize metadata information between Core and the Client Application.
  • A DatabaseBuilder webjob project named <<Solution name>>.Webjobs.DatabaseBuilder. This is the same project that can be seen in the Sidra Core solutions.
  • A DataFactoryManager webjob project named <<Solution name>>.Webjobs.DataFactoryManager.Client. This is a project very similar to the DataFactoryManager project that can be seen in the Sidra Core solutions.

Deployment of Client Applications

The deployment project used in Client Application solutions is exactly the same that the deployment project in Core solutions. The only difference is how this deployment project is used:

  • The PowerShell script that orchestrates the deployment is called ClientDeploy.ps1, instead of CoreDeploy.ps1, although both are included in the project.
  • The environment data files must contain a different set of values. A sample of the values can be found in the deployment project in Scripts\Documentation\ClientData.psd1

The deployment project uses the same mechanisms to generate the build and release pipelines in Azure DevOps, so everything that is explained in the Create project solution section about deployment, environment configuration and CI/CD integration can also be applied to Client Application solutions.

Data movement and orchestration in Client Applications

The Client Applications solutions use Azure Data Factory V2 (ADF) to perform data movements -same as Sidra Core solutions-, but the deployment project creates its own instance of ADF in the Resource Group of the Client Application.

When working with both solutions at the same time -Core and Client Applications- it is important to differentiate the ADF instances. More specifically, there will be:

  • an ADF instance for each Data Storage Unit (DSU) in Core
  • an ADF instance for each Client Application solution

DataFactoryManager for Client Applications

ADF components (datasets, triggers and pipelines) in Client Applications are managed the same way than in Core, by means of the DataFactoryManager webjob. The section Data Factory tables explains how it works in Core.

The DataFactoryManager uses information stored in the metadata database to build and programmatically create the ADF components in Data Factory, which means that Client Applications need a metadata database to store the information about ADF components.

There are some minor differences between DataFactoryManager for Client Applications and for Core:

  • Core version includes the creation of the landing zones -creation of the Azure Storage containers- for the Data Storage Units.
  • The Pipeline table in Core can store ADF pipelines but also Azure Search pipelines. In Client Applications the Pipeline table only stores ADF pipelines.
  • DataFactoryManager in Client Applications has to filter the pipelines to create only those needed for ADF.

Understanding the synchronization between Sidra Core and Client Applications

Sidra provides templates to create Client Applications that can be configured to automatically retrieve the information from a DSU when it is available.

The process of discovering and extracting this information is called "synchronization" between Client Application and Sidra Core.

Sidra provides a component that orchestrates this synchronization, the Sync webjob.

This job is deployed in all Client Applications with the template provided by Sidra. Without any additional configuration, the metadata in the Core database is already synchronized with the metadata in the Client Application database.

The synchronization will take place as long as the Client Application has the right permissions to synchronize with the data in the DSU.

The synchronization webjob is configured to be executed every 2 minutes.

Discovering new content in the Data Storage Unit

Every time that a content is ingested in the Data Lake inside a Data Storage Unit (DSU), a new Asset is created in the metadata database in Core. The Asset is associated to the rest of metadata stored for the content: Entity, Provider, Attributes...

The Client Application keeps a "synchronized" copy of the metadata to discover the ingestion of new Assets in the Data Lake. That means that the Client Application contains a metadata database -similar to the one in Core- to store the following metadata:

  • Provider
  • Entity
  • Attribute
  • AttributesFormat
  • AssetStatus
  • Assets

Even if we say that the metadata of the Core and the Client Application databases is "synchronized", there are several differences between the metadata in Core and in Client Applications worth clarifying:

  • Some fields have been removed from the Client Application counterpart because they are not used, for example the fields used for access control like SecurityPath and ParentSecurityPath.
  • Some fields have been added to the Client Application. This is the case of the field IsDisabled, which has been added to Entity and Provider for disabling the synchronization for that particular Entity or Provider.
  • The state transitions in Core and in Client Applications are different. Therefore, AssetStatus table contains different states and workflows between Core and Client Applications. Once the ingestion of the Asset is finished in Core, the status will be MovedToDataLake in Core, but in the Client Application counterpart the status will continue evolving until ImportedFromDataLake. Please check the Asset metadata section, which explains all those status values and the transitions between them.

Role of Sync and DatabaseBuilder webjob in Client Applications

The Sync webjob uses Sidra API to retrieve latest information from the metadata tables. The metadata returned by the API is conditioned/limited by the permissions granted to the Client Application.

Based on the metadata received from the API, for each metadata table, the webjob updates its Client counterpart. For Provider and Entity tables, any entry that is no longer available in Core is set as disabled in Client using the abovementioned IsDisabled field.

All this information will be used to decide if there is new content in the Data Lake to be imported to the Client Application.

DatabaseBuilder webjob

The population of the metadata database will be performed by the DatabaseBuilder webjob. The project included in the Client Application solution is already configured for this purpose:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
static void Main(string[] args)
{
    ...

    // Add log configuration as well if required
    JobExecutor.Execute(
        configuration,
        new Options()
        {
            CreateClientDatabase = true,
            CreateLogDatabase = true
        },
        loggingBuilderAction);

    ...
}

This job will create the database schema specific for the Client Applications, with the differences explained in the sections above.

It also will be used to include in the database the information of the ADF components (pipeline, dataset, triggers) by using SQL scripts. This section explains How to execute SQL scripts using DatabaseBuilder webjob.

Extracting the new content from the Data Storage Unit

The Sidra Client Applications will use ADF for the data movement, in particular they will use pipelines for the extraction of new content from the Data Lake (more specifically, from the DSU).

The actions performed by the extraction pipelines will depend on what is going to be done with the content after the extraction. This logic is Client Application-specific and tied to business rules transformations. Some examples of the actions possibly executed by the extraction pipelines are:

  • The content may be used to populate a DataWarehouse inside the Client Application. In this case, such content will be first stored into the staging tables in the Client Apps after the extraction.

  • The content will be optionally transformed through the execution of data transformations and business rules within the Client Application and then re-ingested as enriched data back into the Data Lake. In this case, after the extraction and transformation the new content will be pushed to the landing zone in Core for the re-ingestion, as any new data Provider.

In consequence, there is not a single design for the Client Applications extraction pipeline: it will depend on the specifics of the business case implemented by the Client Application.

Extraction pipelines configuration

The extraction pipeline execution is launched by the Sync webjob based on the information in the Client Application metadata database. Such information includes metadata synchronized from Core, but also the configuration of the data extraction pipelines.

The configuration of the data extraction pipelines from the DSU is done in the metadata database, in a very similar way to how the ingestion and extraction pipelines are configured in Sidra Core.

Once a specific extraction pipeline is designed, it can be created and configured by setting it up in the database and allowing the DataFactoryManager to create it in ADF.

The basic configuration of the pipeline follows the same steps that the ones for Core, which is described in this section: How to configure a new pipeline based on templates.

Also, please check the following documents:

Particularities on Client Applications extraction pipelines

The extraction pipelines in Client Applications have some particularities, compared to the pipelines in Sidra Core:

Pipeline type

The Pipeline table in Client Appplications has some additional fields compared to the fields that the Pipeline table has in Sidra Core:

  • PipelineType. The types of pipelines in the Client Applications are different from the types in Sidra Core. There are two new types of Pipelines in Client Applications: ExecutedBySync and IgnoredFromSync. The Sync webjob will only launch a pipeline if its type is of ExecutedBySync.

  • ExecutionParameters. This is an additional field used for including values in the parameters of the Client Application pipeline when the pipeline is executed by the Sync webjob.

Mandatory Assets

The same extraction pipeline can be used to extract several Assets from different Entities. As in Sidra Core, EntityPipeline table is used to associate the same pipeline to several Entities.

Sidra also provides support for defining mandatory Assets, which means that the extraction pipeline will be executed only if all the Assets marked as Mandatory are present in Sidra Core. The mandatory assets can be configured using the field IsMandatory, which is included in the EntityPipeline table and must be taken into account when setting up this relation.Please refer to the tutorial How to associate an entity with a pipeline.

Association with Providers

An association between the Pipeline and Provider must be added using the PipelineProvider table. This association is used by the Sync webjob to execute only those pipelines that are associated to enabled Providers, i.e. those Providers with false in the IsDisabled field.

Extraction pipelines configuration

Once the specific extraction pipeline is configured, it can be created by setting up the appropriate information in the metadata database and allowing the DataFactoryManager create this pipeline in ADF.

The basic configuration of the pipeline in Sidra Core is described in section How to configure a new pipeline based on templates.

However, in Client Applications, some additional configuration is required. In summary:

  • An association between the Pipeline and Provider must be added using the PipelineProvider table.
  • An additional field IsMandatory is included in the association between Pipeline and Entity and must be taken into account when setting up this relation using the tutorial How to associate an entity with a pipeline.
  • Pipelines in Client Applications has some additional fields -PipelineType and ExecutionParameters-, which must be configured.