Sidra Data Platform glossary: dictionary of terminology¶
The Sidra Data Platform glossary is a short dictionary of all the terminology used through this documentation, Sidra's UI or any technical training or knowledge transfer session.
See Client Application.
Each of the data elements that get ingested into the platform; they can range from an intermediate extract form a database into a CSV format, to a PDF file which is part of a larger collection.
Azure Data Lake¶
Azure Data Lake is a flexible HDFS-based storage mechanism. Sidra uses Azure Data Lake Gen. 2 to store data in the default configuration of the Data Storage Units.
The client applications are the pieces of Sidra the enable to drive business cases. Client applications (or Apps), are a set of Azure resources and code, enclosed in a Resource Group, which either access the data from one or multiple Data Storage Units via the secure APIs, or retrieve the data from the Data Lake, applying Business transforms if needed.
The architecture allows for client apps to be built using any set of tools and components, so they can range from Power BI enabled analytical workspaces to even Web apps. Having said that, they all have to use the same security model, notifications and logging infrastructure, etc... via Sidra APIs.
The main component of Sidra Data Platform, the Core encompasses a set of shared services, including the Data Factory Manager, the Data Catalog, the Identity Server, audit and lineage tracking, the log and notifications infrastructure, etc...
This terminology is deprecated, but used to be the original term for Client Application.
A data lake is a centralized repository that allows to store both structured and unstructured data at any scale. In the case of Sidra, the Data Lake is the collection of all the data distributed across the different Data Storage Units.
Data Storage Unit¶
Data Storage Units (DSU) provide logical and physical isolation of data, to help with data compliance and regulations. Each DSU isolate not just the data storage, but also the compute, orchestration, intake and ML models, so they can be colocated in specific geographical regions.
Even though a Sidra implementation can have multiple DSUs, they all form part of the same Data Lake, sharing the Data Catalog, Security model, etc.
A Data Warehouse is a system that pulls together data from many different sources within an organization and builds a model on top of it (usually in dimensional form) for reporting and analysis. In the context of Sidra, and besides the potential Data Warehouse existing in the different customer-specific client applications, the term Data Warehouse implicitely refers to the Internal Data Warehouse.
An entity in Sidra Data Platform is a collection of related data points, observations or documents. It is similar in concept to a Table in relational databases, or a collection in document databases.
Given the multi-modal nature of Sidra, an Entity can hold relational data, while other entity can store document data.
Internal Data Warehouse¶
A SQL Server database in the Core resource group, in Kinbal dimensional model, which keeps track of all the data loads and movements through the lifecycle of each data asset. This keeps tracks of metrics such as time elapsed, total size moved, number of validation errors, etc.
A logical collection of Entities, resulting in a concept similar to a database or a schema.
A physical location where an Azure Datacenter can deploy its resources. Every major component in sidra (the Core, each DSU and all Client Applications) can be colocated in a different regions based on configuration.
A container in Azure Resource Manager that holds related resources for an application.