Sidra Data Platform (version 2025.01: Zany Zestar)¶

Released on January 14, 2025

It's been a long time coming...

This Sidra release arrives after a slightly longer development cycle than initially planned, but we believe you’ll find the wait worthwhile. The highlight of this version is a complete overhaul of Sidra’s authentication service, now powered by the open-source Keycloak IAM solution. Because authentication underpins every component of the platform, we performed extensive testing to ensure a seamless in-place upgrade for all of our customers. With this solid foundation in place, we’re looking forward to resuming our monthly release cadence and steadily delivering new features.

Accompanying this major improvement, we’re also introducing the initial version of a new service: the API Builder. We’re excited to see how our customers and partners leverage it to streamline their Data Mesh initiatives and operational data products.

Finally, and as part of our commitment to keep the platform current, this release includes numerous smaller but significant updates and fixes to enhance overall performance and usability for our entire customer base.

Sidra 2025.01 Release Highlights¶

New Authentication Service
API Builder
Fixed Azure.Identity Vulnerability
Network Improvements
Databricks Runtime Upgrade
Improvements to the Data Connector Toolkit
Parameterization of Azure Search tier
Improved Index Maintenance Pipelines
Removed some Unused Resources

What's new in Sidra 2025.01¶

New Authentication Service¶

The Authentication Service is a key component of the Sidra ecosystem. In addition to streamlining the user login experience, it allows administrators to enforce consistent security policies across all Sidra components. It can be configured for multi-factor authentication (MFA), custom identity providers, and other security measures, depending on organizational needs. This centralized approach simplifies identity management, reduces overhead, and enhances the overall security posture of the Sidra platform.

Up until this release, the Authentication Service was built using Duende Software's IdentityServer4 as the underlying Identity and Access Management platform. When Sidra’s development began several years ago, IdentityServer was fully free and open-source under the Apache 2.0 license. However, with the transition to Duende IdentityServer (IdentityServer5), Duende adopted a commercial licensing model, marking the end of the fully open-source approach provided in IdentityServer4.

To avoid additional licensing costs, Sidra continued using the open-source IdentityServer4, but we knew this would eventually need to change once the product reached end-of-support and potential vulnerabilities went unaddressed. After researching the best possible alternatives, we chose Keycloak. Developed by Red Hat and released under the Apache License 2.0, Keycloak offers broader functionality, including a web-based management interface and additional features. Importantly, although this migration was significant, our team implemented it in a way that requires no extra post-update steps, ensuring a seamless upgrade for all Sidra installations.

Find out more about the Authentication Service.

API Builder¶

One of the main challenges our users and partners encounter when implementing a Data Mesh strategy is the effort involved in building APIs for all potential entities in each domain. This process can be both time-consuming and prone to errors. Additionally, when multiple teams work on different domains, maintaining consistency in naming conventions and development practices can become difficult.

To address this, we’ve introduced a new service in Sidra that automates routine engineering tasks, allowing developers to focus on higher-value activities like data transformations. The Sidra API Builder automatically generates either a REST or GraphQL API for a set of user-defined entities. It then packages and deploys this API as a Docker image to the chosen Data Product, and can be refreshed on demand whenever new entities become available.

All of this is seamlessly integrated with Sidra’s security model and includes the same logging, observability, and metrics support found in other Sidra services. Note that the API Builder is a licensed add-on. If you’d like to try it out but don’t see it available in your Supervisor for installation, please contact us.

Find out more about the API Builder Service.

Fixed Azure.Identity Vulnerability¶

In June 2024, a security vulnerability in the Microsoft Authentication library and Azure Identity libraries was disclosed under CVE-2024-35255. This vulnerability posed a moderate risk of privilege escalation, making it critical to address. While Sidra’s Authentication service does not directly use these libraries, it relies on IdentityServer4 — which, in turn, depended on the affected components. Because upgrading to a non-vulnerable version of IdentityServer was not an option, we accelerated the migration to Keycloak, as described earlier in these release notes.

Network Improvements¶

A number of network-related enhancements have been implemented throughout the codebase to improve performance, resiliency, and security. In particular, new retry policies have been introduced in areas that previously lacked them - such as the DSU deployment - reducing the impact of transient network issues, and some network calls have been packaged together to improve the response times - such as with the MAR calculation. Additionally, we’ve added private endpoints for the Databricks Managed Resource Group and configured a dedicated DNS zone for blob storage, further strengthening the platform’s security and isolation posture.

Databricks Runtime Upgrade¶

We have upgraded the Databricks Runtime of all DSUs and Data Products to the latest Long-Term Support version: 15.4 LTS. This release includes Apache Spark 3.5.0, bringing enhanced performance, stability, and improved error handling for Base64 decoding in both Spark and Photon.

Additionally, there are behavioral changes, such as the disallowance of the undocumented ! syntax (previously used as a substitute for NOT) outside boolean expressions, as well as updates to the default schema binding mode for views. Sidra has been thoroughly tested and updated to handle these breaking changes seamlessly. These enhancements collectively contribute to a more robust and efficient data processing environment within Sidra.

Improvements to the Data Connector Toolkit¶

A new version of the Connector Toolkit is now available, featuring several minor fixes and enhancements. Notably, the toolkit will no longer create a DataSource if no template is associated with it. This change enables scenarios where no actual Data Source exists, eliminating the need to create an unnecessary linked service.

Parameterization of Azure Search tier¶

As part of our ongoing efforts to provide users with greater control over performance and cost configurations, this release introduces the ability to customize the performance tier for Azure Search resources in both the Core and DSUs. This enhancement allows users to better align their resource allocation with their specific workload and budget requirements.

Improved Index Maintenance Pipelines¶

Sidra includes an automated index maintenance job for all system's SQL databases and customer Data Products' SQL databases. In previous versions, this job was executed from the DSU's Data Factory, which triggered a stored procedure to handle index and statistics update. However, this approach could result in multiple concurrent executions. To address this, the process has been rearchitected to leverage a centralized SQL Elastic Job in the Core, ensuring more efficient and reliable index maintenance.

Removed some Unused Resources¶

Sidra 1.x relied on a local Azure Container Registry (ACR) in each DSU for storing AI models — an approach that no longer aligns with the Sidra 2.x architecture. Although these resources were still being deployed, starting with this release, new Sidra installations will no longer provision an ACR or the associated VNET configurations (private DNS zone and private endpoints).

Issues Resolved in Sidra 2025.01¶

These are the issues resolved during this release in Sidra:

Ensured that all parameters in the DIP's Advanced Configuration section are assigned default values. #7833
Fixed an issue that could lead to data not being ingested if a failure happened between the load of two assests of the same entity. #7949
Fixed an issue in the Data Intake Process where the OrchestratorRunId parameter was not included if the pipeline executes more than one batch. #7947
Improved the retry logic for checking the statys of a deployment of Sidra. #8009
Improved the execution time and reliability of the deployment of Sidra by reusing the AAD token in the elastic_job_agent_core deployment. #7924
Fixed an issue with updating a Data Product when the associated SQL user already existed. #8024
Fixed an issue where updating a Data Product triggered the validation process intended for a first-time installation. #8025
Fixed an issue linking a private DNS zone group with the VNET when Supervisor deployment uses the default VNET. #8027
Fixed an issue that caused installation failures when a private DNS zone already existed and was linked to the VNET. #8069
Fixed an issue in the API Connector Toolkit where a plugin failed to update its configuration and certain auto-populated fields were not displayed in the connector's update form. #8132
Resolved an issue where Sidra could attempt to archive files that were already archived, resulting in excessive exception logging in the Log database. #8143
Fixed serveral errores related to Basic Authentication mode on the API Connector Toolkit. #8185
Fixed an issue where a DIP created using the API Toolkit could fail to create the Databricks tables if the 'Expand Collection' option was disabled. #8363
Fixed an issue that could potentially cause the Key Vaults of the Data Products to lack proper assignment of their AAD manager. #8427

Coming soon...¶

For the next release, Sidra will introduce a new connector perfectly suited for this snowy season! Additionally, new cost-reduction features will be rolled out to help optimize resource usage and further enhance the platform’s efficiency.

We Want Your Feedback!¶

Your ideas make Sidra better. For suggestions, issues, or questions, please reach out to us at [email protected].