Sidra Data Platform (version 2019.R4: Concerned Crispin)¶
released on October 4, 2019
Welcome to the October 2019 release of the Sidra Data Platform. This page documents all of the new features, enhancements and visible changes included in the new version 2019.R4: Concerned Crispin.
October ’19 Release Overview¶
In this release we continue to improve the supportability of the platform. We are constantly concerned about data security and privacy and starting from the previous release we introduced Identity Server as identity provider for Sidra Data Platform, and now we continue to work on this integration improving its deployment. Gaining comfort and transparency, we made some improvements in Sidra Data Platform notification system, adding SignalR as part of the core.
Before delving into the details, here are some of the key highlights included in Sidra 2019.R4:
- Identity Server added to the Core template - Identity Server is now included in Core template, so it can easily be deployed in any Sidra installation
- New notification system - A notification system based in SignalR has been added to Sidra as part of the Core API
- Added support for default installation sizes - Able to choose between different installations sizes when deploying Sidra
- Automatic Maintenance of SQL Core Databases - Create some default pipelines that runs index maintenance operations on the SQL Server core databases
- Metadata security overhaul - Enhancements made at metadata security level
Details of what's new in version 2019.R4¶
Identity Server added to the Core template¶
Identity Server is now included in Core template, so it can easily be deployed in any Sidra installation. The Identity Server instance is deployed on a new WebApp which has been included in the template. The Deployment project, also included in the template, now configures the authorization using Identity Server on all Sidra services, like Web API, Web Frontend and WebJobs. Old Active Directory applications are now only used to access Azure Resources, as for example from Azure DevOps during the lunch of a release or from DataFactoryManager when pipelines are created or modified in Azure Data Factory.
New notification system¶
Effective communication is a vital tool that lies in the heart of business operations. By deploying the notification system in Sidra Data Platform, we ensure the right information goes to the right recipients at the right time, increasing the productivity in the long run, improving supportability and reducing risks.
Sidra comes with a new and improved notification system based in SignalR as part of the Core API. SignalR can be configured to use Azure SignalR or to be resident on the WebApp instances containing the Core API. To configure Azure SignalR as notification system use the setting named "UseAzureSignalR" in the Core API appsettings file. Azure SignalR instance and the support for notifications is automatically installed as part of the Sidra deployment and we will continue to add different types of notifications in the future.
Added support for default installation sizes¶
Each installation of Sidra Data Platform deploys its infrastructure in Azure using a parameterized orchestration script. For a wealthy Software Development Life Cycle, it is recommended to have three environments corresponding to Development (space where the developers can try out the new functionalities), Test (an area where Q&A team can operate without affecting either the development or production environments), and the Production one. As not all the environments needs to have the same tier, the orchestration script will use a different set of configuration parameters which are stored in an environment configuration file. The parameters included in the environment configuration file can be modified to tune to the deployment.
Installation size allows to select the size of the deployment: S, M, L, XL.
| Size | Description | 
|---|---|
| S | Configuration for less demanding workloads. | 
| M | Configuration for workloads with typical performance required, like for example testing specific scenarios. | 
| L | Same as per size M, but with more performance. | 
| XL | Configuration for intensive production workloads. | 
For complete detail of all the installation size and resource tier check out Sidra documentation (Sidra Core - DevOps - Infrastructure - Installation sizes).
Sync based on ClientId¶
Continuing with our advances in improving and enhancing the security in Sidra Data Platform, each Consumer App will have a different ClientId to identity itself.
Starting from this release the sync process should not ask the core API for any specific ProviderId or EntityId, instead, it should just provide the ClientId and the API should retrieve the necessary metadata, as for example the Providers or Entities in which the ClientId has permissions. If Sync has no longer permissions over a Provider or Entity, but there are existing data on the client database, that Provider or Entity should be marked as disabled and be ignored for the rest of the process. When asking for new Assets, Sync will provide the same data, but the API should still validate the permissions over that data.
Metadata security overhaul¶
The access to metadata through the API is secured using policies to control methods and to check if the user/service/actor executing the action has the required permission. Each action will require also permissions over specific Entities/Provider/DSU. As for example, a user should be able to create a Entity only in a Provider in which he has permissions and only if it has the claim to execute that type of action.
Unify Core resources¶
In Sidra Data Platform we want to maintain a clear, transparent, easy to maintain environment and for this, we have reorganized our Sidra's resources. All Application Insights resources used in Core has been unified and Identity Server is now deployed in the same Resource Group as Core, sharing the following resources:
- App Service
- SQL Server
- Elastic Pool
- Resource Group
- Key Vault
- AppInsights
Automatic Maintenance of SQL Core Databases¶
One of the best practices for database maintenance says that frequent statistics updates and index reorganization need to be done in order to keep a wealthy system. To cover this, in Sidra we have created some pipelines which are deployed by default. These pipelines runs periodically index and statistics maintenance operations on the SQL Server core databases. This feature was implemented using Ola Hallengren's solution, adding the scripts to the seed and creating the pipeline.
For more information regarding Ola Hallengren's solution, access the official documentation.
Support delta information in CreateTable custom activity¶
CreateTable custom activity, comes with new parameters in order to be able to create tables supporting delta information. When GeneratedDelta flag is 1 in Entity level a couple of new tables are created: a table to store the previous and new values, as a mean to later be able to identify these changes and a view to get the current version of the data for the selected entity.
Support for having different attribute names in Data Lake than in the Source entity¶
This feature comes with two changes which aim to give more flexibility when comes to configure Sidra:
- SourceName in Attribute: This offers the possibility to have a different name of the attribute in the source than the one that is going to have in the data lake table.
- Lookup support: Based on lookup operation, one can get the data from other table instead of accessing the actual file ingested in data lake.
Custom activities migrated to .Net Core solution¶
The custom activity for Google Analytics which retrieves the information form Google Analytics and dumps it on a CSV file and the one for Bing Ads which generate a report and copy it to a landing zone has been migrated to the .Net Core solution.
The custom activity InsertFields was also subjected to the same migration process to .NET Core.
Deploy Data Products with shared batch account¶
The deployment of a Data Product will now accept the settings of an existing Batch Account as parameter. When that happens, it will use that Batch Account instead of creating a new one, reducing the costs.
Issues fixed in 2019.R4¶
- Fix failure when deploying a Machine Learning Workspace if there's already a resource created. #76220
- Fix null reference exception in WebAPI when flow is ClientCredential. #78388
- Entity TableName size has been increased and now can be up to 128 characters long, which is the maximum length allowed by Databricks. #74826
- Improving SQL type inference by creating a method to receive an array of objects with the necessary fields to contain the result of a query and do the inference, without having to provide a connection string. #74775
Feedback¶
We would love to hear from you! For issues, contact us at [email protected]. You can make a product suggestion, report an issue, ask questions, find answers, and propose new features.