Sidra Client apps¶
The Client applications are the pieces of Sidra that enable to drive business cases. Client applications (or Apps) are a set of Azure resources and code, enclosed in a Resource Group, which either access the data from one or multiple Data Storage Units via the secure APIs, or retrieve the data from the Data Lake, applying Business transformations if needed.
The architecture allows for Client apps to be built using any set of tools and components, so they can range from Power BI enabled analytical workspaces to even Web apps. Having said that, they all have to use the same security model, notifications and logging infrastructure, etc... via Sidra APIs.
In some places the terminology Consumer apps can be found, it is deprecated but was the original term for Client apps.
The flexibility of the architecture using Client apps allows every of them to be completely different from the rest. Despite of that, it is common that many Client apps follow the same structure. Sidra provides some Visual Studio templates to create the basic solution of that kind of Apps:
- Client app template, available in the NuGet package PlainConcepts.Sidra.DotNetCore.ConsumerAppTemplate.
- Client app with PowerBI template, available in the NuGet package PlainConcepts.Sidra.DotNetCore.ConsumerAppPowerBITemplate.
More information about how to use those templates can be found in the section Create project solution. From now on in this document, the Client app solutions will reference the Visual Studio solutions generated from those templates.
If the solution created by the Client app template is opened in Visual Studio, the content showed in the Solution Explorer will be:
- A deployment project named
<<Solution name>>.Deployment, -e.g. Sidra.Dev.Client.Deployment-. This is the same project that can be seen in the solutions for Core.
- An Extract custom activity project named
<<Solution name>>.DataFactory.CustomActivity.Extract. This custom activity is used to extract the information of an entity from the Data Lake.
- A Sync webjob project named
<<Solution name>>.DataFactory.Sync. This webjob is used to synchronize metadata information between Core and the Client app.
- A DatabaseBuilder webjob project named
<<Solution name>>.Webjobs.DatabaseBuilder. This is the same project that can be seen in the Core solutions.
- A DataFactoryManager webjob project named
<<Solution name>>.Webjobs.DataFactoryManager.Client. This is a project very similar to the DataFactoryManager that can be seen in the Core solutions.
The deployment project used in Client App solutions is exactly the same that the one in Core solutions. The only difference is how it is used:
- The PowerShell script that orchestrates the deployment is
CoreDeploy.ps1although both are included in the project.
- The environment data files must contain a different set of values. A sample of the values can be found in the deployment project in
It uses the same mechanisms to generate the build and release pipelines in Azure DevOps, so everything that is explained in the Create project solution section about deployment, environment configuration and CI/CD integration can also be applied to Client apps solutions.
Data movement and orchestration¶
The Client apps solutions use Azure Data Factory V2 (ADF) to perform data movements -same as Core solutions-, but the deployment project creates its own instance of ADF in the Resource Group of the Client app.
When working with both solutions at the same time -Core and Client apps- it is important to differentiate the ADF instances, there will be:
- an ADF instance for each Data Storage Unit (DSU) in Core
- an ADF instance for each Client app solution
DataFactoryManager for Client apps¶
ADF components (datasets, triggers and pipelines) in Client apps are managed the same way it is done in Core, using the DataFactoryManager webjob. The section Data Factory tables explains how it works in Core.
The DataFactoryManager uses information stored in the metadata database to build and programmatically create the ADF components in Data Factory, which means that Client apps need a metadata database to store the information about ADF components.
There are some minor differences between DataFactoryManager for Client apps and for Core:
- Core version includes the creation of the landing zones -creation of the Azure Storage containers- for the Data Storage Units.
- The Pipeline table in Core can store ADF pipelines but also Azure Search pipelines. In Client apps the Pipeline table only stores ADF pipelines. DataFactoryManager in Core has to filter the pipelines to create only those for ADF.
Excluding those minor differences, the behaviour of DataFactoryManager is exactly the same.
Synchronization with Core¶
Client apps consume the content from the Data Lake, so it is important to know if new content has been ingested in order to extract it and incorporate to the Client app storage. The process of discovering and extracting is called "synchronization" and Sidra provides a component that orchestrates it, the Sync webjob. The webjob is configured to be continuous and it is executed every 2 minutes.
Discovering new content in the Data Lake¶
Every time that a content is ingested in the Data Lake, a new asset is created in the metadata database in Core. The asset is associated to the rest of metadata stored for the content: entity, provider, attributes...
The Client app keeps a "synchronized" copy of the metadata to discover the ingestion of new assets in the Data Lake. That means that the Client app contains a metadata database -similar to the one in Core- to store the following the metadata:
It was said "synchronized" -between quotes- because it is not an exact copy of the tables from Core. There are some differences:
- Some fields have been removed from the Client app counterpart because they are not used, for example the fields used for access control like
- Some fields have been added to the Client app, like the field
Providerthat is used for disabling the synchronization for that particular entity or provider.
- The status field of the assets will not keep the same value. Once the ingestion of the asset is finished in Core, the status will be
MovedToDataLakebut in Client app counterpart the status will continue evolving until
ImportedFromDataLake. The section Asset metadata explains all those status values and the flows between them.
Under the hood, the Sync webjob uses Sidra API to retrieve latest information from metadata tables. The metadata returned by the API is conditioned/limited by the permissions granted to the Client app. Based of the metadata received from the API, for each metadata table, the webjob updates its Client counterpart. For Provider and Entity tables, any entry that is no longer available in Core is set as disabled in Client using the aforementioned
All that information will be used to decide if there is new content in the Data Lake to be imported to the Client app.
Extracting the new content from Data Lake¶
As commented previously, the Client app will use ADF for the data movement, in particular it will use pipelines for the extraction of new content from the Data Lake.
The actions performed by the extraction pipelines will depend on what is going to be done with the content after the extraction. Some examples are:
The content will be used to populate a DataWarehouse, so it will be stored into the staging tables after the extraction.
The content will be transformed and re-ingested in the Data Lake, so after the extraction and transformation it will be pushed to the landing zone in Core for the re-ingestion.
So, there is no a single design for the extraction pipeline, it will depends on the specific of the business case of the Client app.
Extraction pipelines configuration¶
The extraction pipeline execution is launched by the Sync webjob based on the information in the Client metadata database. That information includes metadata synchronized from Core but also the configuration of the extraction pipelines.
The configuration of the extraction pipelines is done in the metadata database in a way similar to how the pipelines are configured in Core. Once a specific extraction pipeline is designed, it can be created by setting it up in the database and allowing the DataFactoryManager create it in ADF. The basic configuration of the pipeline follows the same steps that the ones for Core, which is described in this section: How to configure a new pipeline based on templates. But the extraction pipelines have some particularities:
Pipeline table in Client apps has some additional fields regarding its counterpart in Core:
PipelineType. The types of pipelines in the Client apps are different from the ones in Core. There are two types of pipelines:
IgnoredFromSync. The Sync webjob will only launch a pipeline if is from
ExecutionParameters. It used for including values in the parameters of the pipeline when it is executed by the Sync webjob.
The same extraction pipeline can be used to extract several assets from different entities. That can be configured by associating the same pipeline to several entities using the
Sidra also provides support for defining mandatory assets which means that the extraction pipeline will be executed only if all the assets marked as mandatory are present in Core. The mandatory assets can be configured using the field
IsMandatory which is included in the
EntityPipeline table and must be taken into account when setting up this relation using the tutorial How to associate an entity with a pipeline.
Association with providers¶
An association between the Pipeline and Provider must be added using the
PipelineProvider table. This association is used by the Sync webjob to execute only those pipelines that are associated to enabled providers, i.e. those providers with false in the
Extraction pipelines execution¶
The extraction pipelines will be launched by the Sync webjob once it has checked the following conditions:
ExecutedBySync. The metadata database stores all the pipelines that will be created in the Client app ADF, some of them will be extraction pipelines but others can be created for other tasks. So, pipelines are classified into two categories:
IgnoredFromSync, only the pipelines of the first type will be executed.
- The pipeline is not associated to a provider that is disabled.
- The pipeline is not being executed.
- There are assets ready to be exported i.e. the asset status is
- If the relation between the pipeline and the entity
IsMandatoryand the entity is valid -the date of the asset to be exported are between the
EndValidDateof the entity- then there is at least as many assets ready to be exported than the entity
Extract custom activity¶
The extraction pipelines can use the Extract custom activity to extract the information from the Data Lake and copy into an Azure Storage as a 'CSV' file.
The population of the metadata database will be performed by the DatabaseBuilder webjob. The project included in the Client app solution is already configured for this purpose:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
And it will create the database schema specific for the Client apps with the differences showed in the sections above.
It also will be used to include in the database the information of the ADF components (pipeline, dataset, triggers) by using SQL scripts. This section explains How to execute SQL scripts using DatabaseBuilder webjob.