Sidra Data Platform glossary¶
The Sidra Data Platform glossary serves as a gathering place for terminology used throughout this documentation, Sidra's UI, and technical training or knowledge transfer sessions.
Data Intake Process¶
A Data Intake Process in Sidra is an abstraction concept that relates a set of configurations for ingesting data from a given data source (e.g., SQL Server database, Sharepoint library, etc.), as well as all the related data extraction infrastructure generated for the data intake (e.g., metadata, trigger, data extraction pipelines).
A Sidra connector is an internal architectural concept in Sidra to refer to an assembly of code that is installed and executed to connect to a source system.
A logical collection of Entities, resulting in a concept similar to a database or a schema.
An Entity is a collection of related data points, observations or documents. The concept is akin to a Table in relational databases or a collection in document databases.
Given the multi-modal nature of Sidra, an Entity can hold relational data, while other Entity can store document data.
Each data element ingested into the platform. They can range from database extracts to CSV or PDF files.
Data Storage Unit¶
Data Storage Units (DSUs) provide logical and physical isolation of data to help with data compliance and regulations. Each DSU isolate not just data storage, but also compute and orchestration so they can be located in specific geographical regions.
Supervisor is a self service deployment tool interface that precedes Sidra Web Manager, which includes all the services available in Sidra and initializes the Sidra Web after login.
Sidra Service offers the infrastructure of the main functionalities within Sidra Data Platform, as the Operations and API Management, Data Catalog, Anomaly Detection and Sidra Web.
Authentication Service in Sidra implements the Identity Server to facilitate authentication processes.
Data Quality Service¶
Data Products are the pieces of Sidra uses to address business needs. Enclosed within a Resource group, they can access one or multiple Data Storage Units through secure APIs. Data Products apply business logic and transform data following business requirements.
Sidra’s flexible architecture enables Data Product creation with any set of tools and components, ranging from Power BI to Web apps. They use the same security model, logging infrastructure, etc., via Sidra APIs.
The PII detection feature applied to Sidra's Platform can evaluate Personally Identifiable Information (PII) that is being ingested into the platform, helping to ensure sensitive data is properly managed and governed.
A physical location where an Azure Data Center can deploy its resources. Every major component in Sidra (each DSU and all Data Products) can be collocated in a different region based on configuration.
A container in Azure Resource Manager that holds related resources for an application.
Azure Data Lake¶
Azure Data Lake is a flexible HDFS-based storage mechanism (Hadoop Distributed File System). Sidra uses Azure Data Lake Gen 2 to store data in the Data Storage Units. In general terms, a data lake is a centralized repository allowing storage for structured and unstructured data in any scale. For Sidra, the Data Lake is a collection of data distributed across different Data Storage Units.
Sidra provides a range of APIs designed to facilitate action management across its entire suite of services.