Data Mesh Client Application¶
Sidra provides a specific template for a Client Application that manages and hosts relational modelling of data according to business rules, as well as the hosting of this data as part of an included Azure Search resource. In addition, the different database resources for this Client App are provided as part of an SQL Elastic Pool, thus allowing to scale up while saving costs.
This Client Application template has the name of Data Mesh Client Application.
For more information on SQL Elastic Pool in Azure you can access the Azure Elastic Pool documentation.
For more information on Azure Search you can access the Azure Search Service documentation.
The Data Mesh Client Application template is based on the Basic SQL template but provides Azure Search and Elastic Pool services with two databases by default, one with the Sidra metadata and another for the client data side.
The components for this Client Application template are:
SQL Elastic Pool, which can be scaled up/down automatically using the appropriate ADF pipeline template and will host two databases:
- The Sidra database, hosting the reduced version copy of the metadata tables in Sidra Core, in order to track Assets metadata as well as Data Factory metadata and configuration.
- The Data Mesh database, hosting the staging tables and the relational model. These transformed data models are exposed via APIs and Azure Search.
Azure Search, which is aimed to define deep searches over the client database data.
This document refers to key concepts of a Client Application in Sidra, which can be reviewed here.
This Client Application template allows to accelerate the creation of a Client Application, by abstracting all the main synchronization elements with Sidra Core. As long as the Client Application which is created with this template is configured to have the required permissions to access the DSU data, this application transparently and automatically retrieves data from the DSU into the staging tables.
This Client Application integrated with Sidra shares the common security model with Sidra Core and uses Identity Server for authentication. A copy of the relevant ingested assets metadata is kept always synchronized with Sidra Core. The metadata synchronization is performed by an automated Sync job, explained here.
The actual data flow orchestration is performed via a specific instance of Azure Data Factory installed in the Client Application.
High-level installation details¶
As with any other type of Client Application in Sidra, the process of installing this Client Application consists of the following main steps:
- A dotnet template is installed, which launches a build and release pipeline in Azure DevOps defined for this Client Application.
- As part of the build and release pipeline for this Client Application, the needed infrastructure is installed. This includes the execution of the Deploy.ps1 deployment script, and also the different WebJobs deployment.
This Client Application is configured once per environment, by using a key-value list of elements, called Azure DevOps variables, or variable group. There is one variable group in DevOps per environment.
In order to build and deploy the Client Application, it is enough to execute the pipeline
Sidra.App.DataMesh in the desired Git branch, dependign on the environment (e.g., dev, test or prod).
The Azure DevOps pipeline executes two actions:
- Application build
- Deployment of the Azure resources
Build+Release is performed with multi-stage pipelines, so no manual intervention is required once the template is installed by default. For more information on these topics you can access this Documentation, and this tutorial.
The Data Mesh Client Application resources are contained into a single resource group, separated from the Sidra Core and DSU resource groups. The services included in the ARM template for this Client Application contain the following pieces:
- Storage account for raw data: used for storing the copy of the data that is extracted from the DSU, and for which the Client Application has access.
- Data Factory: used for data orchestration pipelines to bring the data from the DSU.
- Elastic Pool: used for sharing resources between the client app’s databases:
- Sidra Database: used for keeping a synchronized copy of the Assets metadata between Sidra Core and Data Labs.
- Client Database: used for hosting the relational models and transformation stored procedures.
- Key Vault: used for storing and accessing secrets in a secure way.
- Azure Search: used for defining deep searches over the client database data.
Also, in this template the possibility to deploy login/users automatically has been included. More specifically, the role
datameshaccess and login/user
FWPUSER have been added.
The passwords are retrieved from the Client Application Key Vault, so they need to be included there so that the logins can be successfully deployed.
The secrets related with passowrd for SQL logins present the following naming structure:
In case the secret is not manually created, the dployment in Azure DevOps will use an initial invalid value. The SQL login will not be created if this initial value is present, but only when the latest version of that value is different from the initial value. It will be requied to add the secret in the Key Vault for every existing environment where this Client App is to be deployed.
Besides the Azure infrastructure deployed, several Webjobs are also deployed for the Basic SQL and Databricks Client Application, responsible for the background tasks of data and metadata synchronization:
In order to copy the data from the Entities in the DSU, the Client Application needs to request a token to the Identity Server service. This token will only be valid for a restricted time frame. If the validity period of such token needs to be extended, the Identity Server database in the Core resource group allows to configure such setting. In order to do this, we need to increase the validity period of the tokens by doing the following:
Apply an UPDATE over the table
[dbo].[Clients], extendign the value of the field
[AccessTokenLifetime] for DataMesh. For example, to extend to 5 hours:
Client Application pipelines¶
Section Client Application pipelines includes information on the available Client Application pipelines to be used for this Client Application. See Default pipeline template for extraction for details on this pipeline, parameters, etc.