Sidra Client apps

The Client applications are the pieces of Sidra that enable to drive business cases. Client applications (or Apps) are a set of Azure resources and code, enclosed in a Resource Group, which either access the data from one or multiple Data Storage Units via the secure APIs, or retrieve the data from the Data Lake, applying Business transformations if needed.

The architecture allows for Client apps to be built using any set of tools and components, so they can range from Power BI enabled analytical workspaces to even Web apps. Having said that, they all have to use the same security model, notifications and logging infrastructure, etc... via Sidra APIs.

In some places the terminology Consumer apps can be found, it is deprecated but was the original term for Client apps.

Solution templates

The flexibility of the architecture using Client apps allows every of them to be completely different from the rest. Despite of that, it is common that many Client apps follow the same structure. Sidra provides some Visual Studio templates to create the basic solution of that kind of Apps:

  • Client app template, available in the NuGet package PlainConcepts.Sidra.DotNetCore.ConsumerAppTemplate.
  • Client app with PowerBI template, available in the NuGet package PlainConcepts.Sidra.DotNetCore.ConsumerAppPowerBITemplate.

More information about how to use those templates can be found in the section Create project solution. From now on in this document, the Client app solutions will reference the Visual Studio solutions generated from those templates.

Solution structure

If the solution created by the Client app template is opened in Visual Studio, the content showed in the Solution Explorer will be:

client-solution-structure

It contains:

  • A deployment project named <<Solution name>>.Deployment, -e.g. Sidra.Dev.Client.Deployment-. This is the same project that can be seen in the solutions for Core.
  • An Extract custom activity project named <<Solution name>>.DataFactory.CustomActivity.Extract. This custom activity is used to extract the information of an entity from the Data Lake.
  • A Sync webjob project named <<Solution name>>.DataFactory.Sync. This webjob is used to synchronize metadata information between Core and the Client app.
  • A DatabaseBuilder webjob project named <<Solution name>>.Webjobs.DatabaseBuilder. This is the same project that can be seen in the Core solutions.
  • A DataFactoryManager webjob project named <<Solution name>>.Webjobs.DataFactoryManager.Client. This is a project very similar to the DataFactoryManager that can be seen in the Core solutions.

Deployment

The deployment project used in Client App solutions is exactly the same that the one in Core solutions. The only difference is how it is used:

  • The PowerShell script that orchestrates the deployment is ClientDeploy.ps1 instead of CoreDeploy.ps1 although both are included in the project.
  • The environment data files must contain a different set of values. A sample of the values can be found in the deployment project in Scripts\Documentation\ClientData.psd1

It uses the same mechanisms to generate the build and release pipelines in Azure DevOps, so everything that is explained in the Create project solution section about deployment, environment configuration and CI/CD integration can also be applied to Client apps solutions.

Data movement and orchestration

The Client apps solutions use Azure Data Factory V2 (ADF) to perform data movements -same as Core solutions-, but the deployment project creates its own instance of ADF in the Resource Group of the Client app.

When working with both solutions at the same time -Core and Client apps- it is important to differentiate the ADF instances, there will be:

  • an ADF instance for each Data Storage Unit (DSU) in Core
  • an ADF instance for each Client app solution

DataFactoryManager for Client apps

ADF components (datasets, triggers and pipelines) in Client apps are managed the same way it is done in Core, using the DataFactoryManager webjob. The section Data Factory tables explains how it works in Core.

The DataFactoryManager uses information stored in the metadata database to build and programmatically create the ADF components in Data Factory, which means that Client apps need a metadata database to store the information about ADF components.

There are some minor differences between DataFactoryManager for Client apps and for Core:

  • Core version includes the creation of the landing zones -creation of the Azure Storage containers- for the Data Storage Units.
  • The Pipeline table in Core can store ADF pipelines but also Azure Search pipelines. In Client apps the Pipeline table only stores ADF pipelines. DataFactoryManager in Core has to filter the pipelines to create only those for ADF.

Excluding those minor differences, the behaviour of DataFactoryManager is exactly the same.

Synchronization with Core

Client apps consume the content from the Data Lake, so it is important to know if new content has been ingested in order to extract it and incorporate to the Client app storage. The process of discovering and extracting is called "synchronization" and Sidra provides a component that orchestrates it, the Sync webjob. The webjob is configured to be continuous and it is executed every 2 minutes.

Discovering new content in the Data Lake

Every time that a content is ingested in the Data Lake, a new asset is created in the metadata database in Core. The asset is associated to the rest of metadata stored for the content: entity, provider, attributes...

The Client app keeps a "synchronized" copy of the metadata to discover the ingestion of new assets in the Data Lake. That means that the Client app contains a metadata database -similar to the one in Core- to store the following the metadata:

  • Provider
  • Entity
  • Attribute
  • AttributesFormat
  • AssetStatus
  • Assets

It was said "synchronized" -between quotes- because it is not an exact copy of the tables from Core. There are some differences:

  • Some fields have been removed from the Client app counterpart because they are not used, for example the fields used for access control like SecurityPath and ParentSecurityPath.
  • Some fields have been added to the Client app, like the field IsDisabled added to Entity and Provider that is used for disabling the synchronization for that particular entity or provider.
  • The status field of the assets will not keep the same value. Once the ingestion of the asset is finished in Core, the status will be MovedToDataLake but in Client app counterpart the status will continue evolving until ImportedFromDataLake. The section Asset metadata explains all those status values and the flows between them.

Under the hood, the Sync webjob uses Sidra API to retrieve latest information from metadata tables. The metadata returned by the API is conditioned/limited by the permissions granted to the Client app. Based of the metadata received from the API, for each metadata table, the webjob updates its Client counterpart. For Provider and Entity tables, any entry that is no longer available in Core is set as disabled in Client using the aforementioned IsDisabled field.

All that information will be used to decide if there is new content in the Data Lake to be imported to the Client app.

Extracting the new content from Data Lake

As commented previously, the Client app will use ADF for the data movement, in particular it will use pipelines for the extraction of new content from the Data Lake.

The actions performed by the extraction pipelines will depend on what is going to be done with the content after the extraction. Some examples are:

  • The content will be used to populate a DataWarehouse, so it will be stored into the staging tables after the extraction.

  • The content will be transformed and re-ingested in the Data Lake, so after the extraction and transformation it will be pushed to the landing zone in Core for the re-ingestion.

So, there is no a single design for the extraction pipeline, it will depends on the specific of the business case of the Client app.

Extraction pipelines configuration

The extraction pipeline execution is launched by the Sync webjob based on the information in the Client metadata database. That information includes metadata synchronized from Core but also the configuration of the extraction pipelines.

The configuration of the extraction pipelines is done in the metadata database in a way similar to how the pipelines are configured in Core. Once a specific extraction pipeline is designed, it can be created by setting it up in the database and allowing the DataFactoryManager create it in ADF. The basic configuration of the pipeline follows the same steps that the ones for Core, which is described in this section: How to configure a new pipeline based on templates. But the extraction pipelines have some particularities:

Pipeline type

The Pipeline table in Client apps has some additional fields regarding its counterpart in Core:

  • PipelineType. The types of pipelines in the Client apps are different from the ones in Core. There are two types of pipelines: ExecutedBySync and IgnoredFromSync. The Sync webjob will only launch a pipeline if is from ExecutedBySync type.

  • ExecutionParameters. It used for including values in the parameters of the pipeline when it is executed by the Sync webjob.

Mandatory assets

The same extraction pipeline can be used to extract several assets from different entities. That can be configured by associating the same pipeline to several entities using the EntityPipeline table.

Sidra also provides support for defining mandatory assets which means that the extraction pipeline will be executed only if all the assets marked as mandatory are present in Core. The mandatory assets can be configured using the field IsMandatory which is included in the EntityPipeline table and must be taken into account when setting up this relation using the tutorial How to associate an entity with a pipeline.

Association with providers

An association between the Pipeline and Provider must be added using the PipelineProvider table. This association is used by the Sync webjob to execute only those pipelines that are associated to enabled providers, i.e. those providers with false in the IsDisabled field.

Extraction pipelines execution

The extraction pipelines will be launched by the Sync webjob once it has checked the following conditions:

  1. The PipelineType is ExecutedBySync. The metadata database stores all the pipelines that will be created in the Client app ADF, some of them will be extraction pipelines but others can be created for other tasks. So, pipelines are classified into two categories: ExecutedBySync and IgnoredFromSync, only the pipelines of the first type will be executed.
  2. The pipeline is not associated to a provider that is disabled.
  3. The pipeline is not being executed.
  4. There are assets ready to be exported i.e. the asset status is MovedToDataLake or ImportingFromDataLake.
  5. If the relation between the pipeline and the entity IsMandatory and the entity is valid -the date of the asset to be exported are between the StartValidDate and EndValidDate of the entity- then there is at least as many assets ready to be exported than the entity FilesPerDrop.

Extract custom activity

The extraction pipelines can use the Extract custom activity to extract the information from the Data Lake and copy into an Azure Storage as a 'CSV' file.

DatabaseBuilder

The population of the metadata database will be performed by the DatabaseBuilder webjob. The project included in the Client app solution is already configured for this purpose:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
static void Main(string[] args)
{
    ...

    // Add log configuration as well if required
    JobExecutor.Execute(
        configuration,
        new Options()
        {
            CreateClientDatabase = true,
            CreateLogDatabase = true
        },
        loggingBuilderAction);

    ...
}

And it will create the database schema specific for the Client apps with the differences showed in the sections above.

It also will be used to include in the database the information of the ADF components (pipeline, dataset, triggers) by using SQL scripts. This section explains How to execute SQL scripts using DatabaseBuilder webjob.