Basic SQL Data Product¶
In order to enable the transformation and modelling of relational data according to business rules, Sidra provides a template for a vanilla version of Data Product, called Basic SQL Data Product.
The main component of this Data Product template is a client SQL Database, that will host two sections of data:
- A reduced version copy of the metadata tables in Sidra Core, in order to track Assets metadata as well as Data Factory Metadata and configuration.
- Any data model and stored procedures required for data transformation according to business rules and final analytics data use case.
This document refers to key concepts of a Data Product in Sidra, which can be reviewed here.
Purpose¶
This Data Product template allows to accelerate the creation of a Data Product, by abstracting from all the main synchronization elements with Sidra Core. As long as the Data Product which is created with this template is configured to have the required permissions to access the DSU data, this application transparently and automatically retrieves data from the DSU into the staging tables.
This Data Product integrated with Sidra shares the common security model with Sidra Core and uses Identity Server for authentication. A copy of the relevant ingested Assets metadata is kept always synchronized with Sidra Core. The metadata synchronization is performed by an automated Sync job, explained here.
The actual data flow orchestration is performed via a specific instance of Azure Data Factory installed in the Data Product.
High-level installation details¶
As with any other type of Data Product in Sidra, the process of installing this Data Product consists of the following main steps:
- A dotnet template is installed, which launches a build and release pipeline in Azure DevOps defined for this Data Product.
- As part of the build and release pipeline for this Data Product, the needed infrastructure is installed. This includes the execution of the ClientDeploy.ps1 and Databricks deployment scripts, and also the different WebJobs deployment.
Build+Release is performed with multi-stage pipelines, so no manual intervention is required once the template is installed by default.
For more information on these topics you can access this Documentation.
Step-by-step
Create a Data Product from scratch
For more information, check the specific tutorial for creating a Data Product .
Architecture¶
The Basic SQL Data Product resources are contained into a single resource group, separated from the Sidra Core and DSU resource groups. The services included in the ARM template for this Data Product contain the following pieces:
- Storage account for raw data: used for storing the copy of the data that is extracted from the DSU, and for which the Data Product has access.
- Data Factory: used for data orchestration pipelines to bring the data from the DSU.
- Client Database: used for keeping a synchronized copy of the assets metadata between Sidra Core and Data Labs and hosting the relational models and transformation stored procedures.
- Key Vault: used for storing and accessing secrets in a secure way.
Besides the Azure infrastructure deployed, several Webjobs are also deployed for the Basic SQL and Databricks Data Product, responsible for the background tasks of data and metadata synchronization:
- Sync
- DatabaseBuilder
- DatafactoryManager
Data Product pipelines¶
Info
- Section Data Product pipelines includes information on the available Data Product pipelines to be used for this Data Product.
- See Default pipeline template for extraction for details on this pipeline, parameters, etc.