Sidra Client Applications main concepts

The Client Applications are the pieces of Sidra that enable to drive business cases. The Client Applications layer can be also described as the serving layer for Sidra Data Platform.

Client App overview

Client Applications (or Apps) are a set of Azure resources and code, enclosed in a Resource Group, which either access the data from one or multiple Data Storage Units via the secure APIs, or retrieve the data from the Data Lake, applying business transformations if needed.

Any actor that needs to access the data stored in the Data Lake for a specific business need is catalogued as a Client Application. Client Apps consume the content from the DSU(s), so it is important to know if new content has been ingested in order to extract it and incorporate to the Client Application storage.

The architecture allows for Client Applications to be built using any set of tools and components, so they can range from Power BI-enabled analytical workspaces to even full-blown Web apps. Having said that, all Client Applications must use the same security model, notifications and logging infrastructure, etc., via Sidra APIs. This centralization and templatization of governance and transversal components is what makes Sidra Data Platform an accelerator for innovation on serving data use cases.

Note: In some places the terminology Consumer apps to refer to Client Applications may still be found. This term is now deprecated.

Overview of the Client Application creation and deployment processes

Sidra Client Application pipelines are designed to perform installation and deployment of the Client Applications in a highly automated way, following a Continuous Integration/Continuous Deployment (CI/CD) process.

The general process is as follows:

  • Step 1: We need to download locally the corresponding Client Application template.
  • Step 2: The source for the actual instance of the Client Application needs to be created from this template.
  • Step 3: Once the .NET solution has been obtained with its code projects, we need to push this code to a specific branch (depending on the environment) or an Azure DevOps repository. The repository and branches need to have been created previously if they did not exist yet.
  • Step 4: Finally, the solution will be deployed with CI/CD pipelines. For this, the Client Application generates a build+release definition in YAML format. This YAML definition will be used by Azure DevOps to configure the integration and deployment processes. A set of artifacts will be required for this, that may vary according to the type of Client Application template that is used. There are mainly two alternatives: data configuration files (.psd1) or variable groups in DevOps. Both artifacts contain mainly required parameters for creating and deploying the infrastructure of the Client Application in Azure. The latter (variable groups) is the recommended option for new templates creation, as this is compatible with the plugins approach for Client Applications. You can see more details of the plugins approach in the Connectors documentation pages.

Step 1: Client Application Solution templates

The flexibility of the architecture using Client Apps allows every application to be completely different from the rest and tailored to serve business needs. Despite of that, it is common that many Client Applications follow the same structure.

Sidra provides some Visual Studio templates to create the basic solution for a Client Application. Among the Sidra-provided Visual Studio templates are the following:

  • Basic Client Application template, available in the NuGet package PlainConcepts.Sidra.DotNetCore.ClientAppTemplate.
  • DataLab Client Application template, available in the NuGet package PlainConcepts.Sidra.DotNetCore.DataLabAppTeplate.

Details about what these Client Application templates are included in this main section documentation pages.

Obtaining the Client Application template

The first piece neded to create a Client Application template is indeed the template code project. The template code project can be available as a NuGet feed (for example for the Sidra-provided Client Applications DataLabs or ClientApp). This template code project can also be in a customer Azure DevOps repository,in Sidra project.

From this template project, an actual Client Application code project (instantiation of the template) will be created with just a dotnet command, as seen below.

Downloading from a NuGet feed

The templates may be located within the PlainConcepts.Sidra.DotNetCore Git repository and released as NuGet packages.

Assuming that the NuGet feed contains the NuGet package with the template, the template can be installed by running the following command in a CMD or PowerShell window:

dotnet new -i [NuGet Package Id] --nuget-source [NuGet Feed Url]

, where:

  • NuGet Package Id: should be the NuGet Package Id for a specific template (e.g.,: PlainConcepts.Sidra.DotNetCore.ClientAppTemplate)
  • NuGet Feed Url: should be the URL of the NuGet feed.

The installation of a new VS template will be done for a specific .NET Core SDK version. The template will be installed in the .NET Core SDK version configured for the path where the installation command is executed which can be checked with this command:

dotnet --version

Once the templates are installed, a project can be created based on them. To list the available downloaded templates the following command could be executed:

dotnet new sidra -l

Downloading from a code repository

In this case, if the template code project is in an Azure DevOps repository, we will need to first clone such repository into our laptop. Once this is done, we will execute the following command:

dotnet new -i PATH_TO_CLIENT_APP_TEMPLATE/ClientAppAzSearch/Content , where:

  • PATH_TO_CLIENT_APP_TEMPLATE: this is the local path where the repository that we just cloned is.

Once the templates are installed, a project can be created based on them. To list the available downloaded templates the following command could be executed:

dotnet new sidra -l

Step 2: Creating a source code project based on templates

The next step after obtaining the DotNet template of a Client Application is to proceed to actually create an instance of the Client Application from that template.

Before starting the installation of a Client Application, a new Git repository will need to be created in Azure DevOps.

The short name of the Client Application template is what will be used to create the templates.

A .NET Core template could include parameters to populate the solution created. There is a set of default parameters available that can be listed using the command:

dotnet new -h

Additionally, every template can define its own list of additional parameters which can be shown using the following command:

dotnet new template short name -h

Once we have the template, we can create locally the source code project through a command simmilar to this one (command for DataLab Client App template short name sidra-datalab), but respecting the needed parameters required and available for each template:

dotnet new sidra-datalab --appName NAME --serviceConnectionName SERVICE_CONNECTION_NAME --force

where:

  • NAME is the name that we will give to the Client Application
  • SERVICE_CONNECTION_NAME will be the name of the service connection to the Azure environment where the Client App will be installed (e.g. Sidradev). The service connection objects hshould have been created wihen installing Sidra Core in the respective environment.

If the command has been successfully executed, the result should be:

The template XXX was created successfully

Step 3: Pushing the code to the repository

After creating the source code locally with the above command, the code will need to be uploaded to the created Git repository in Azure DevOps.

From now on in this document, the Client Application solutions will reference the Visual Studio solutions generated from those templates.

Client Applications solution structure

When the solution created by the Client Application template is opened in Visual Studio, the code content should look like this list of components.

Depending on the Client Application template there may be additional or some different components.

This one is for the Basic Client Application (template sidra-app).

Generally, there should be a set of folders, the main source code folder (src), plus other additional files.

Client App solution

Under the src folder we can see the different projects of the .NET solution:

  • A deployment project named Solution name.Deployment, e.g. Sidra.Dev.Client.Deployment. This is the same project that can be seen in the solutions for Sidra Core, for the deployment scripts of all the infrastructure in the Client Application. This project includes more or less PowerShell scripts depending on the complexity of the infrastructure deployed.
  • A Sync webjob project named Solution name.DataFactory.Sync. This webjob is used to transparently synchronize metadata information between Core and the Client Application database. This is the mechanism by which the Client Application is made aware of the Assets ingested into Sidra Core.
  • A DatabaseBuilder webjob project named Solution nam>.Webjobs.DatabaseBuilder. This is the same project that can be seen in the Sidra Core solutions, and the main purpose is for creating the needed tables and seed data in in the Client Application database.
  • A DataFactoryManager webjob project named Solution name.Webjobs.DataFactoryManager.Client. This is a project very similar to the DataFactoryManager project that can be seen in the Sidra Core solutions. The main purpose is for running the processes that will synchronize the database Data Factory metadata with real Azure Data Factory infrastructure.
  • A Database project named Solution name.Database. This project contains the Client Application database.
  • A WebApi project named Solution name.WebApi, -e.g. + A Database project named Solution name.Database. This project contains the Client Application database.
  • A Database project named Solution name.Database. This project contains the Client Application database. Sidra.Dev.Client.WebApi. This is project contains the Client Application API logic.

Below is a depiction of such project structure:

Client App solution projects

Apart from the src folder, other important files present are:

  • azure-pipelines.yml: This is the YAML file for build and release orchestration (see below section). It will be used to create a valid YAML Azure DevOps pipeline. This file is used in order to compile the solution and generate the required artifacts in the release steps. This YAML may as well require additional YAML files contained in the Templates folder.
  • *.props: These are the required .NET Core files that include references to the packages used to build the solution. They ensure that the last Sidra packages versions are used.
  • .sln: The .NET solution file.

Step 4: Deployment of Client Applications

Sidra Client Application pipelines are designed to perform installation and deployment of the Client Applications in a highly automated way, following a Continuous Integration/Continuous Deployment (CI/CD) process.

The Deployment project described above usually includes the following:

  • One or several .ps1 PowerShell scripts. There will be generally one orchestrator script and there may be several other PowerShell scripts to deploy specific infrastructure components.
  • There may be an environment data .psd1 file with parameters about the infra components and settings. These settings are then internally used to populate an Azure DevOps variable group, which will be required by the release process. Depending on the Sidra version when the Client Application template was created, the deployment may not need to use the .psd1 and just require configuring the Azure DevOps variable group to work. the deployment may use this file. For Sidra version 1.9 new templates are created without the need of such .psd1 file (it will be empty), but directly using a DevOps variable group populated with such parameters.

In order to create a variable group, the Client Application template may include a helper script located in the root path of the .NET solution. The name of such script is ConfigureDevOps.ps1. This script should have been edited to contain all variables that the deployment needs. In order to execute this script use this command:

.\ConfigureDevOps.ps1 -organization https://dev.azure.com/ORGANIZATION -project PROJECT -environments ENV

where:

ORGANIZATION: the name of the organization in Azure DevOps. PROJECT: the name of the Azure DevOps project, usually Sidra, but may be a different name. ENV : the name of the environment where we are deploying, e.g., dev, test, prod.

Inside this script, we should edit the list of variables, in the command for the actual creation of the variable grouo,

az pipelines variable-group create --name NAME --variables VAR_LIST

This command includes two parameters:

NAME is the name of the variable group. We do not need to indicate that as it will be created automatically from the environment. VAR_LIST is the list of variables, where we will specify the values.

As introduced above, the continuous integration process requires to configure a build+release using a YAML file (azure-pipelines.yml).

Therefore, from the repository, we click on Set up build:

Client App set up build

Then, we select the option Existing Azure Pipelines YAML file:

Client App configure DevOps pipeline

We need to select the branch according to the environment where we are making the installation, as well as the path to the YAML file with the template pipeline, as seen below:

Client App select YAML

Finally, we confirm the changes and execute the pipeline to proceed with the deployment.

This way, every time that there is a push of code to the repository, the build will be launched. If we require any change in the build, we can modify the YAML file and push these codes to the repository.

NOTE for old versions: - For versions of the Client Application template still using the old mechanism of the environment data file (.psd1) we will still need to manually create an Azure DevOps variable group, even if it is empty and it will be later automatically populated by the deployment script. In order to do this we need to manually create a variable group with a dummy variable in it. Without doing this, the deployment will fail as the variable group needs to be manually created and cannot be done as part of a deployment script. In order to do this, we need to go to Azure DevOps > Pipelines > Library and click on the button to create a new variable group:

1
2
3
4
5
6
![Create new variable group](../attachments/create-new-variable-group.png)

The name of this variable group will be `ENVIRONMENT.APP_NAME`. For example in the case of DataLab for a dev environment it would be `dev.DataLab`.
Once created, we enter into the variable group and we add a variable named `Dummy` with value `Dummy`:

![Create dummy variable](../attachments/create-dummy-variable.png)

The complete continuous workflow for the CI/CD when making changes to the Client Application would then be:

  • The developer will push the source code changes to the Git repository.
  • The build and release pipeline will be automatically triggered by the changes in the repository. The release will use the artifacts generated by the build to deploy the infrastructure and the metadata for the solution.

Data movement and orchestration in Client Applications

The Client Applications solutions use Azure Data Factory V2 (ADF) to perform data movements -same as Sidra Core solutions-, but the deployment project creates its own instance of ADF in the Resource Group of the Client Application.

When working with both solutions at the same time -Core and Client Applications- it is important to differentiate the ADF instances. More specifically, there will be:

  • an ADF instance for each Data Storage Unit (DSU) in Core
  • an ADF instance for each Client Application solution

Job DataFactoryManager for Client Applications

ADF components (datasets, triggers and pipelines) in Client Applications are managed the same way than in Core, by means of the DataFactoryManager webjob.

The section Data Factory tables explains how it works in Core.

The DataFactoryManager job uses information stored in the metadata database to build and programmatically create the ADF components in Data Factory, which means that Client Applications need a metadata database to store the information about ADF components.

There are some minor differences between DataFactoryManager for Client Applications and for Core:

  • Core version includes the creation of the landing zones -creation of the Azure Storage containers- for the Data Storage Units.
  • The Pipeline table in Core can store ADF pipelines but also Azure Search pipelines. In Client Applications the Pipeline table only stores ADF pipelines.
  • DataFactoryManager in Client Applications has to filter the pipelines to create only those needed for ADF.

Understanding the synchronization between Sidra Core and Client Applications

Sidra provides templates to create Client Applications that can be configured to automatically retrieve the information from a Data Storage Unit (DSU) when it is available.

The process of discovering and extracting this information is called synchronization between a Client Application and Sidra Core.

Sidra provides a component that orchestrates this synchronization, the Sync webjob.

This job is deployed in all Client Applications with the template provided by Sidra. Without any additional configuration, the metadata in the Core database is already synchronized with the metadata in the Client Application database.

The synchronization will take place as long as the Client Application has the right permissions to synchronize with the data in the DSU. This is done by editing the Balea Authorization permissions tables (Users >Client Application Subject > Permissions).

The synchronization webjob is configured to be executed every 2 minutes.

Discovering new content in the Data Storage Unit

Every time that a content is ingested in the Data Lake inside a Data Storage Unit (DSU), a new Asset is created in the metadata database in Core. The Asset is associated to the rest of metadata stored for the content: Entity, Provider, Attributes...

The Client Application keeps a synchronized copy of the metadata to discover the ingestion of new Assets in the Data Lake.

That means that the Client Application contains a metadata database -similar to the one in Core. In the Client Application database, the metadata tables for Sidra are under the schema Sidra.

These are some of the most important tables used in the Client Application metadata database:

  • DataFactory: this table stores the Azure Data Factory resource group for the Client Application
  • Provider
  • Entity
  • Attribute
  • EntityPipeline: to store the Entity-Pipeline associations.
  • AttributesFormat
  • AssetStatus
  • Assets
  • Trigger
  • TriggerPipeline
  • TriggerTemplate
  • Pipeline
  • PipelineTemplate
  • PipelineSyncBehavior: this is described in this documentation page.

Even if we say that the metadata of the Core and the Client Application databases is synchronized, there are several differences between the metadata in Core and in Client Applications that are worth clarifying:

  • Some fields have been removed from the Client Application metadata tables because they are not used, for example the fields used for access control like SecurityPath and ParentSecurityPath.
  • Some fields have been added to the Client Application metadata tables. This is the case of the field IsDisabled, which has been added to Entity and Provider for disabling the synchronization for that particular Entity or Provider.
  • The state transitions in Core and in Client Applications are different. Therefore, AssetStatus table contains different states and workflows than in Sidra Core. For example, once the ingestion of the Asset is finished in Core, the status will be MovedToDataLake in Core, but in the Client Application the status will continue evolving until ImportedFromDataLake.

You can check the Asset metadata section, which explains the above metadata tables, and all those status values and the transitions between them.

Role of Sync and DatabaseBuilder webjobs in Client Applications

The Sync web job uses Sidra API to retrieve latest information from the metadata tables. The metadata returned by the API is conditioned/limited by the permissions granted to the Client Application.

Based on the metadata received from the API, for each metadata table, the webjob updates its Client Application metadata tables. For Provider and Entity tables, any entry that is no longer available in Core is set as disabled in Client using the abovementioned IsDisabled field.

All this information will be used to decide if there is new content in the Data Lake to be imported to the Client Application.

In addition to this, the Sync webjob will also be responsible for executing the defined pipeline, depending on the sync behavior defined for each pipeline (see PipelineSyncBehavior), described here.

Sidra Core has the concept of dummy Assets, which are Assets of zero length that get created when an incremental load in Core finishes but results in no new increment of data. This concept was introduced in order to force the presence of a new Asset in the Client Application metadata tables. Without these Assets, the data synchronization would not be generated if the Assets are configured as mandatory Assets (see below information on this point). If these Assets are mandatory but not generated, the data movement would not happen and this could affect the business logic on the Client Application. Since Sidra version 1.10 the generation of dummy Assets in Data ingestion pipelines is optional, with default set to false. Therefore, if the data processing logic of a Client Application needs to have these Assets generated, please ensure that this parameter has the correct setting to true when deploying data intake pipelines.

DatabaseBuilder webjob

The population of the metadata database will be performed by the DatabaseBuilder webjob. The project included in the Client Application solution is already configured for this purpose:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
static void Main(string[] args)
{
    ...

    // Add log configuration as well if required
    JobExecutor.Execute(
        configuration,
        new Options()
        {
            CreateClientDatabase = true,
            CreateLogDatabase = true
        },
        loggingBuilderAction);

    ...
}

This job will create the database schema specific for the Client Applications, with the differences explained in the sections above.

It also will be used to include in the database the information of the ADF components (pipeline, dataset, triggers) by using SQL scripts. This section explains how to execute SQL scripts using DatabaseBuilder webjob.

Extracting the new content from the Data Storage Unit

The Sidra Client Applications will use ADF for the data movement, in particular they will use pipelines for the extraction of new content from the Data Lake (more specifically, from the DSU).

The actions performed by the extraction pipelines will depend on what is going to be done with the content after the extraction. This logic is Client Application-specific and tied to business rules transformations. Some examples of the actions possibly executed by the extraction pipelines are:

  • The content may be used to populate a DataWarehouse inside the Client Application. In this case, such content will be first stored into the staging tables in the Client Application after the extraction.

  • The content will be optionally transformed through the execution of data transformations and business rules within the Client Application.

  • Optionally, this transformed data can be re-ingested as rich data back into the Data Lake. In this case, after the extraction and transformation the new content will be pushed to the landing zone in Core for the re-ingestion, as any new data Provider for Sidra is configured.

In consequence, there is not a single design for the Client Applications extraction pipeline: it will depend on the specifics of the business case implemented by the Client Application.

Extraction pipelines configuration

The extraction pipeline execution is launched by the Sync webjob based on the information in the Client Application metadata database. Such information includes metadata synchronized from Core, but also the configuration of the data extraction pipelines.

The configuration of the data extraction pipelines from the DSU is done in the metadata database tables, in a very similar way to how the ingestion and extraction pipelines are configured in Sidra Core.

Once a specific extraction pipeline is designed, it can be created and configured by setting it up in the database and allowing the DataFactoryManager to create the actual pipeline infrastructure it in ADF.

The basic configuration of the pipeline follows the same steps that the data intake pipelines configuration for Core. Data intake pipelines configuration is is described in this section: How to configure a new pipeline based on templates.

Also, please check the following documents:

Particularities on Client Applications extraction pipelines

The extraction pipelines in Client Applications have some particularities, compared to the pipelines in Sidra Core:

Pipeline type

The Pipeline table in Client Applications has some additional fields compared to the fields that the Pipeline table has in Sidra Core:

  • ExecutionParameters: This is an additional field used for including values for the parameters of the Client Application pipeline when the pipeline is executed by the Sync webjob.

Mandatory Assets

The same extraction pipeline can be used to extract several Assets from different Entities. As in Sidra Core, EntityPipeline table is used to associate the same pipeline to several Entities.

Sidra also provides support for defining mandatory Assets, which means that the extraction pipeline will be executed only if all the Assets marked as Mandatory are present in Sidra Core. The mandatory Assets can be configured using the field IsMandatory, which is included in the EntityPipeline table and must be taken into account when setting up this relation. Please refer to the tutorial How to associate an entity with a pipeline.

Association with Providers

An association between the Pipeline and Provider can be added using the PipelineProvider table. This association is used by the Sync webjob to execute only those pipelines that are associated to enabled Providers, i.e. those Providers with false in the IsDisabled field.

Extraction pipelines configuration

Once the specific extraction pipeline is configured, it can be created by setting up the appropriate information in the metadata database and allowing the DataFactoryManager job to create this pipeline in ADF.

The basic configuration of the pipeline in Sidra Core is described in section How to configure a new pipeline based on templates.

However, in Client Applications, some additional configuration is required. In summary:

  • An additional field IsMandatory is included in the association between Pipeline and Entity and must be taken into account when setting up this relation using the tutorial How to associate an entity with a pipeline.
  • Pipelines in Client Applications has an additional field - ExecutionParameters-, which must be configured.

ExtractPipelineExecution table

The ExtractPipelineExecution table is an execution tracking table that is only present in a Client Application.

Column Description
Id Id or the record
IdPipeline Id of the Client Application pipeline
PipelineRunId GUID with the RunId of the pipeline execution in ADF
PipelineExecutionDate Timestamp of execution of the pipeline
IdEntity Id of the Entity
IdAsset Id of the Asset
AssetDate Business date of the asset being ingested
AssetOrder An order number internally used in the stored procedures for loading the data