Data Intake Process in Sidra¶

A Data Intake Process in Sidra is an abstraction concept that relates a set of configurations for ingesting data from a given data source (e.g., SQL Server database, Sharepoint library, etc.), as well as all the related data extraction infrastructure generated for the data intake (e.g., metadata, trigger, data extraction pipelines). The purpose of the Data Intake Process then is making possible a end-to-end process fully configurable with the Sidra's UI for the data intake, unlike the Data Intake via landing or with indexing which requires previous configurations.

A Sidra connector, on the other hand, is an internal architectural concept in Sidra to refer to an assembly of code that is installed and executed to connect to a source system. Connectors implement a series of interface methods for managing the installation and configuration of data extraction and ingestion elements and pipelines, as well as the creation of the associated metadata in Sidra (e.g., Provider, Entities, Attributes).

A connector for Data Intake Process also packages the configuration parameters so that only a subset of the needed parameters is retrieved from the user. These parameters will be the inputs to be filled in during the wizard steps.

For Data Intake Processes to be created from the Web UI, it is required that the underlying data extraction and ingestion pipeline templates as well as code are packaged as a connector in Sidra.

Sidra Data Lake Approach to Data Intake Processes¶

Sidra Data Platform is an end-to-end data platform whose key approach to integrate with source systems and bring the data to the platform domain is a data-lake approach.

Bringing data to the platform domain is required to access data that is usually in silos in operational systems, most of the times on-prem, and which needs to be made available for analytics consumption.

The data lake is just one of the first steps in the overall architecture to allow the fast setup of data products and applications based on analytical data.

The data lake is used to standardize the Data Intake Process of different data sources and their mapping to Sidra Metadata system. Thanks to this standardization it is possible to carry out data governance use cases, like security, granular access control and define and enforce data integration standards.

You can see more information about the Sidra metadata model related to data ingestion in this page.

The Azure Data Lake Storage Gen2 (ADLS Gen2) is the service where all the data for every data Provider is added to the system for making it available for downstream consumption.

In opposition to the traditional Data Warehouses, data lakes store the information in the most pure and raw format possible (the concept of immutable data lake), whether it is structured or unstructured data. This allows to ease the data ingestion logic: shifting the paradigm from ETL (extract-transform-load) to ELTs (extract-load-transform), and to focus on the usage of this data by each Data Product.

The next key piece in Sidra for the end-to-end platform is the concept of Sidra Data Products.

Data Products in Sidra are the pieces of Sidra Data Platform that enable business cases (development platform). These data business cases encompass the specific business transformation logic (application specific data transformations and validations) and the serving of a data interface to serve final use cases (e.g., reporting) or to further expose to other consuming external applications in the enterprise.

You can find more information about Sidra Data Products key concepts here.