Sidra Resource Groups infrastructure¶

All the infrastructure needed in Sidra Data Platform is deployed using ARM templates and grouped in Resource Groups.

Concepts

Azure Resource Manager, also known as ARM, is the deployment and management service for Azure. It provides a consistent management layer that enables to create, update, and delete resources in Azure subscription.
In Azure, a Resource Group is a container in Azure Resource Manager that holds related resources for an application.

For every Resource Group in Sidra, there is a set of applications needed by Sidra Service and the Resource Groups with its related roles.

ARM Templates are a way to declare the Azure resources used by the platform, with all its properties in a JSON file.

For more information about Sidra deployment through ARM template, check the documentation guide.

Resource Groups in Sidra¶

Sidra Service contains at least three different kinds of resource groups in Azure:

Sidra Service Resource Group.
DSU Resource Group (minimum 1 DSU Resource Group, it is possible to add additional DSUs).
Databricks DSU and Management App managed resources groups.

When the Data Quality Service is installed, a separate resource group is generated. Similarly, as part of the Data Products setup, resource groups are also created.

The deployment scripts in Sidra also ensure that the defined resource naming convention, as well as the resource tags naming convention, is satisfied when creating these resources.

Besides the Sidra Service Resource Groups, the resources associated to each Sidra Data Product resources will be logically included in a different Resource Group.

1. Sidra Service Resource Group ARM template diagrams¶

Sidra Service Resource Group includes the group of Azure resources required for key transversal as well as common services across Sidra, namely:

App Service plan and App Services for Sidra Service API, Sidra Web, Authentication and Authorization services.
Common services: App Insights, SignalR, LogAnalytics.
KeyVault for secret management.
Persistence Layer through SQL Elastic Pool and SQL databases for the normal functioning of Sidra: Sidra Services, Log, Authentication.

Below is a logical high-level diagram depicting the key Azure resources inside the Sidra Service Resource group:

The xxxx prefix on the services name will be replaced by the project name prefix (4 characters) chosen at Sidra installation time.

The above-mentioned databases are inside an Elastic Pool. During initial loads it is recommended to scale up these resources at least to a Standard tier with 400 eDTUs, although the size may be bigger depending on the initial installation size and data intake needs.

The connection strings to connect to these databases are inside the KeyVault, with the following secret default names:

ConnectionStrings -- CoreContext
ConnectionStrings -- DwContext
ConnectionStrings -- IdentityServer
ConnectionStrings -- LogContext

SignalR is used for managing notifications in real time. Alerts are are managed through the Application Insights deployed in Sidra Service resource group as well. The notifications generated by SignalR can be seen in real-time in Sidra web.

Alerts and monitoring are are managed through the Application Insights deployed in Sidra Service resource group as well. The metics could be analyzed in more detail through personalized dashboards in Application Insights.

Details about the databases in Sidra Service¶

The main database, called Sidra, is composed of different schemas with different purposes. These are described in the Metadata section.

The DW database is a relational database with a dimensional model that stores operational and process data about the data intake (durations, stored volumes, etc.) of all Sidra Providers. This database is used by the Power BI operational reports as well as by the API and Sidra Web.

The Log database contains tables and views with logs of the platform.

The Authentication database is an IdentityServer4 database, responsible to keep all the info related to authorizations required by the different agents that consume the services of the platform.

Details about the AppServices in Sidra Service¶

The API App service is a broad service that provides management capabilities, related to the data ingestion, as well as general administration and configuration of the platform.

The Web App service is used for the web admin portal. From this portal, users are able to visualize and manage data related to data intake, Data Products, Logs, Authorizations, etc.

Additionally, Sidra uses other services like Authentication and Authorization Server for managing aspects relating to user authorization and access management.

2. DSU Resource Group ARM template diagram¶

Sidra DSU Resource Group includes the group of Azure resources required for the storage, orchestration, computing, cognitive and machine learning services of a deployable Data Storage Unit (DSU).

Data Storage Units (DSU) provide logical and physical isolation of data, to help with data compliance and regulations. Each DSU isolates not just the data storage, but also the compute, orchestration, intake and ML models, so they can be collocated in specific geographical regions.

Even though a Sidra implementation can have multiple DSUs, they all form part of the same Data Lake, sharing the resources in Sidra Service for modules and functionalities like the Data Catalog, Metadata Management, Security model, etc.

A Sidra DSU Resource group contains fundamentally the following types of resources:

Orchestration (mainly through Data Factory)
Storage accounts (for landing, raw as well as optimized storage)
Compute services (Databricks)
Model deployment, knowledge store and model management services (e.g. AI Services, AI Search Service, Container Registry Service, Azure ML Service)
KeyVault for secret management for DSU operations. This KeyVault is different than the one in Sidra Service in order to fulfill compliance requirements that may oblige to physically separate the data and keys.

Data Factory is the main Microsoft service used for the data integration and orchestration. ADF pipelines perform the communication between the sources of data and the data lake. The data intake processes are executed periodically and automatically through configured triggers, so that the Data Lake is always updated and in synch with the source of data.

The computing operations on the data are executed through Databricks. Standard data transformations and optimizations are executed through this platform.

Below is a logical high-level diagram depicting the key Azure resources inside the DSU Resource group:

The xxxx prefix on the services name will be replaced by the project name prefix (4 characters) chosen at Sidra installation time.

Data Product Resource Group ARM template diagrams¶

In the case of Sidra Data Products, there is high variability according to the limitless possibilities that can constitute a Data Product in Sidra. See section on Sidra Data Products for more details about what constitutes a Data Product in Sidra and underlying resources.

As a general rule, most Data Products will contain a set of common types of resources, such as:

Storage and persistence services
Compute services
Orchestration services
Common services (App Services for API, Web, KeyVault, etc)

Each Data Product template will therefore be depicted by a different ARM template diagram.