Sidra Data Platform Resource Groups infrastructure

All the infrastructure needed in Sidra Data Platform is deployed using ARM templates. Azure Resource Manager, also known as ARM, is the deployment and management service for Azure. It provides a consistent management layer that enables to create, update, and delete resources in Azure subscription.

All the underlying cloud resources are grouped in resource groups. In Azure, a Resource Group is a container in Azure Resource Manager that holds related resources for an appplication. For every Resource Group in Sidra, there is a set of applications needed by Core and the Resource Groups with its related roles.

ARM Templates are a way to declare the Azure resources used by the platform, with all its properties in a JSON file. The orchestration script contains the parametrization and deployment order of all the resources to be deployed.

Resource Groups in Sidra

Sidra Core contains at least two different resource groups in Azure, which end up being codified in the form of ARM templates after the initialization and orchestration and parametrization process.

  • Core Resource Group
  • DSU Resource Group (minimum 1 DSU Resource Group, but there can be as many as DSUs in the Sidra Core installation)

The deployment scripts in Sidra also ensure that the defined resource naming convention, as well as the resource tags naming convention, is satisfied when creating these resources. The specifications of such resources depend on a set of pre-defined characteristics as per the Sidra installation size (see Sidra Core Deployment Installation sizes), as well as the specific data installation variables.

Besides the Sidra Core Resource Groups, the resources associated to each Sidra Client Application resources will be logically included in a different Resource Group.

Sidra Core Resource Group ARM template diagrams

Sidra Core Resource Group includes the group of Azure resources required for key transversal as well as common services across Sidra, namely:

  • App Service plan and App Services for Sidra Core API, Sidra Web, authentication and authorization services.
  • Commmon services: App Insights, SignalR, LogAnalytics
  • KeyVault for secret management
  • Persistence Layer through SQL Elastic Pool and SQL databases for the normal functioning of Core Sidra services: Core, Log, Operational DW, Identity Server.

Below is a logical high-level diagram depicting the key Azure resources inside the Core Resource group:

core-resource-group

Note: the xxxx prefix on the services name will be replaced by the project name prefix (4 characters) chosen at Sidra installation time.

The above-mentioned databases are inside an Elastic Pool. During initial loads it is recommended to scale up these resources at least to a Standard tier with 400 eDTUs, although the size may be bigger depending on the initial installation size and data intake needs.

The connection strings to connect to these databases are inside the KeyVault, with the following secret default names:

  • ConnectionStrings -- CoreContext
  • ConnectionStrings -- DwContext
  • ConnectionStrings -- IdentityServer
  • ConnectionStrings -- LogContext

SignalR is used for managing notifications in real time. Alerts are are managed through the Application Insights deployed in Core resource group as well. The notifications generated by SignarlR can be seen in real-time in Sidra web.

Alerts and monitoring are are managed through the Application Insights deployed in Core resource group as well. The metics could be analyzed in more detail through personalized dashboards in Application Insights.

Details about the databases in Sidra Core

The main database, called Core, is composed of different schemas with different purposes. These are described in the Metadata section.

The DW database is a relational database with a dimensional model that stores operational and process data about the data intake (durations, stored volumes, etc.) of all Sidra Providers. This database is used by the Power BI operational reports as well as by the API and Sidra Web.

The Log database contains tables and views with logs of the platform.

The IdentityServer database is an IdentityServer4 database, responsible to keep all the info related to authorizations required by the different agents that connsume the services of the platform.

Details about the AppServices in Sidra Core

The API App service is a broad service that provides management capabilities, related to the data ingestion, as well as general administration and configuration of the platform.

The Web App service is used for the web admin portal. From this protal, users are able to visualize and manage data related to data intake, Client Applications, Logs, Autorizations, etc.

Additionally, Sidra uses other sevices like Identity Server and Balea Server for managing aspects relating to user authorization and access management.

Sidra DSU Resource Group ARM template diagram

Sidra DSU Resource Group includes the group of Azure resources required for the storage, orchestration, computing, cognitive and machine learning services of a deployable Data Storage Unit (DSU).

Data Storage Units (DSU) provide logical and physical isolation of data, to help with data compliance and regulations. Each DSU isolates not just the data storage, but also the compute, orchestration, intake and ML models, so they can be colocated in specific geographical regions.

Even though a Sidra implementation can have multiple DSUs, they all form part of the same Data Lake, sharing the resources in Sidra Core for modules and functionalities like the Data Catalog, Metadata Management, Security model, etc.

A Sidra DSU Resource group contains fundamentally the following types of resources:

  • Orchestration (mainly through Data Factory)
  • Storage accounts (for landing, raw as well as optimized storage)
  • Compute services (Databricks)
  • Model deployment, knowledge store and model management services (e.g. Azure Cognitive Services, Azure Search Service, Container Registry Service, Azure ML Service)
  • KeyVault for secret management for DSU operations. This KeyVault is different than the one in Core in order to fulfill compliance requirements that may obligue to physically separate the data and keys.

Data Factory is the main Microsoft service used for the data integration and orchestration. ADF pipelines perform the communication between the sources of data and the data lake. The data intake processes are executed periodically and automatically through configured triggers, so that the Data Lake is always updated and in synch with the source of data.

The computing operations on the data are executed through Databricks. Standard data transformations and optimizations are executed through this platform.

Below is a logical high-level diagram depicting the key Azure resources inside the DSU Resource group:

dsu-resource-group

Note: the xxxx prefix on the services name will be replaced by the project name prefix (4 characters) chosen at Sidra installation time.

Sidra Client Application Resource Group ARM template diagrams

In the case of Sidra Client Applications, there is high variability according to the limitless possibilities that can constitute a Client Application in Sidra. See section on Sidra Client Applications for more details about what constitutes a Client Application in Sidra and underlying resources.

As a general rule, most Client Applications will contain a set of common types of resources, such as:

  • Storage and persistence services
  • Compute services
  • Orchestration services
  • Common services (App Services for API, Web, KeyVault, etc)

Each Client Application template will therefore be depicted by a different ARM template diagram.