What is Sidra Data Platform¶
Sidra Data Platform, also known as Sidra, is a full end to end big data platform and data lake platform built on Azure PaaS technologies. Sidra provides a solution for a myriad of enterprise data lake scenarios reducing the time to market, providing scalability and ease of maintenance.
Built on Azure PaaS¶
Sidra is built on Azure PaaS. It is an enterprise data lake solution focused on deploying a working system quickly and easily, facilitating scalability throughout the lifecycle of the platform and simplifying every action related to maintenance. This is possible to the high degree of automation across different life-cycle processes, like first-time installation, updates, data integration pipelines and Data Products deployment.
Customizable¶
Sidra is an automated, customizable and flexible platform with the capability to process large amounts of data regardless of its source. It offers, among other features, the possibility of storing data in multiple regions in a transparent way, an integrated data catalogue service, data lineage control, consolidated view of logs and audit, as well as a comprehensive set of associated services and extensibility APIs. Thanks to Sidra accelerators (API and templates), configuring and building new data sources and installing and updating Data Products is an easy process.
Data Governance¶
Sidra provides the common foundation, shared services and governance of the data on which organizations build their specific use cases; from analytical applications based on SQL Server and Power BI, to scenarios of exploratory analysis.
Competitive advantages¶
- Full deployment of the platform in a matter of days
- Automation of data source configuration gets you from zero to data lake in hours
- Modular and adaptable to each scenario, thanks to the powerful concept of Sidra Data Products
- The Data Catalog and governance capabilities can help address the data protection regulations challenges. Sidra is continuously evolving with technology and best practices. This allows to focus on innovation and reduce time to market to build value on data
Key Features¶

Knowledge Store
Multimodal storage supporting all types of data sources: from databases and APIs to documents and media files.

ML Model Serving Platform
Enable your Data Science teams to build, test and deploy secure models, while keeping track of both code and training data for audit and explainability purposes.

Security and Identity
Identity management via Identity Server, allowing secured access to the platform to users with different authentication providers (Azure Active Directory, Google Accounts…).

Data Intake ML Models
Pre-packaged models that tackle the most common challenges during the data load process, such as corruption or anomalies in the data set, as well as automatic detection of PII sensitive data.

Integration and Extensibility
APIs for the integration of third-party tools in areas such as Data Catalog or Data Retrieval, as well as Python SDK for Data Scientists.

Data Load Automation
Automation of ETL/ELT process through the automatic generation of pipelines for extraction and movement of data as well as data processing.

Data Governance
Complete Data Catalog with web UI and API access, as well as data lineage audit and traceability.

Batch and Real-time
Support for both batch and real-time data loads, enabling operational data lake scenarios.
How does it work¶
Sidra orchestration is based on data pipelines, a set of data processing elements connected in series. This approach allows to eliminate many manual steps from the data integration process and enables a smooth, automated end-to-end flow of data.
The data orchestration in Sidra begins by defining which data is collected, where and how.
The solution we look for is to be able to collect data points from many different sources and process the results in near real-time.
The steps for this data ingestion in Sidra are:
-
The structured or unstructured data arrives to the landing zone (a specific storage container in Sidra) or arrives by Sidra connector plugins from different types of databases (Azure SQL, SQL Server, MySQL, Oracle, SharePoint libraries...).
-
Afterwards, the incoming is registered based on its defining metadata (file registration).
-
Then, the data is stored in a raw storage and validated.
-
As final step, in the Sidra Service data pipeline, as part of the file ingestion process, the data is stored in an optimized storage, from where anyone who needs to interact with the data can have access to.
Main concepts¶
The main concepts of Sidra are:
- Sidra Data Platform through Supervisor encompasses a set of shared services management UI, App Services, API, audit, metadata system, secret, identity and authorizations management.
- DSUs are the groups of resources encompassing the processing of data for a Sidra installation, completely integrated with Sidra Service. Each DSU includes resources and services about data orchestration, data storage, data processing, as well as services like Azure Cognitive services, Model Serving, Azure Search, etc. for the knowledge store.
- Sidra services and data in the DSU are made available to the Data Products. These Data Products are the separate groups of Azure resources that will perform the business use cases on the data (e.g. exploratory data analysis, Power BI reports, data quality transformations, etc.). The amount of possible Data Products is huge, as the underlying use cases they support. Data Products synchronize transparently with the metadata and data in the DSU and with Sidra Service shared services for common security, authorization and authentication schemas.
Sidra shared services¶
Sidra shared services offer the following properties:
Automated generation of data pipelines for fast and scalable data ingestion
The number of data pipelines should be able to grow according to the platform needs. That is, the number of data sources to be ingested will make no difference from the time-to-market point of view. For instance, ingesting 10 or 1000 tables from SQL Server will be the same in terms of deployment effort when using Sidra´s template-based generation system.
Comprehensive audit of all the system operations
In Sidra Data Platform we perform advanced lineage tracking of all entities and transformation. This ensure us an independent examination of the software product and its processes.
Low latency
Time is always an important factor nowadays, everything is built in order to be fast and accurate. For this, when needing to interact and query the data the feedback should be immediate. Sidra Data Platform offers low latency for exploratory purposes and for building data products that need to update in near real-time.
Web-based management UI
Sidra Data Platform comes with a modern, web-based management UI aiming to fulfill the needs of both developers and administrators. Sidra Web provides a visual user widget to ingest new data, a dashboard to track the operational status and a central management of the different logs.
Data Catalog
A Data Catalog that provides a view of all Entities loaded across the different storage regions using a set of services focused on the management and discoverability of the data.
Monitoring
Monitoring is one of the key factors that helps to save money in network performance, productivity and infrastructure costs. The operational activities, such as Data Intake Processes in Sidra Data can be monitored using operational Power BI Dashboards.
Anomaly detection models for the data movement activities For a smooth running data workflow, a robust and stable infrastructure is needed. Anomaly detection models for the data movement activities is an important tool in Sidra, that helps to identify unusual proceedings that can have impact on the process, so the outliers are detected before the data is processed.
For more information about Sidra Service, you can check its documentation page.
Data Product¶
Any actor that needs to access the data stored in the DSU for a specific business need is catalogued as a Data Product.
Each Data Product makes use of their own set of tools. Each Data Product is able to retrieve data from one or several DSUs and applying business transforms if needed on this data.
Sidra is concerned about security, so the access level for each Data Product can be granularly controlled. That is, if a Data Product needs to set up a sandbox for ML experiments carried out by a third party, this application will be configured to access restricted to different DSUs, Providers or even Entities. This way, the Data Product could be designed to access just data on one Provider with only non-sensitive data.
The Data Products can be deployed in multiple instances and in different geographical zones.
For more information about Data Products, you can check its documentation page.