About Sidra plugins and Data Intake Processes¶
This article is aimed to explain basic concepts about the Data Intake Process and the connector plugins relationship.
As mentioned in previous pages, Data Intake Process in Sidra is an abstraction concept that relates a set of configurations for intaking data from a given data source (e.g., SQL Server database, Sharepoint library, etc.), as well as all the related data extraction infrastructure generated for the data intake (e.g., metadata, trigger, data extraction pipelines). The purpose of the Data Intake Process then is making possible a end-to-end process fully configurable with the Sidra's UI for the data intake, unlike the Data Intake via landing or with indexing which requires previous configurations.
Sidra incorporates several types of plugins. For plugins to create Data Intake Processes, the
plugin type = connector.
When configuring and executing a new data intake for a new or existing data source system, several underlying steps are involved:
- On one hand, the necessary metadata and data governance structures are created in Sidra. This includes the creation of the data source that is specific to the engine and type of the source system (e.g., SQL Server, Azure SQL, DB2 database, etc.). This also includes the generation of the metadata.
- On the other hand, the actual data integration infrastructure elements (e.g., ADF pipelines and triggers) are created, configured and deployed.
Sidra API provides different API methods for creating the needed metadata structures to configure a new Data Intake Process. This includes creating the Provider, creating the Data Source and creating and deploying the pipelines. However, with the latest versions of Sidra there is a new mechanism in Sidra web, that allows the configuration of data intake for a set of source systems. Users can choose from a gallery of available connector plugins to configure a Data Intake Process. The plugin is transparently installed in Sidra Core, and a wizard is displayed to the user. Each wizard contains a set of configuration parameters, so the user can enter those parameters to submit (confirm) the creation of the respective Data Intake Process.
Just after providing the configuration parameters in the respective wizard form steps, such as connection string parameters, or trigger selection, all the orchestration to create the underlying Data Intake Process happens. Thanks to the Sidra Data Intake Process wizards, the creation of this Data Intake Process infrastructure is configured in less than five minutes. Once the Data Intake Process for the data source is up and running, all the underlying metadata (Sidra metadata) and infrastructure (Azure Data Factory objects) will be in place.
Configure a new data intake
More details and concepts on what happens when configuring a new data intake via connector plugins is described in detail in this page .
Under this documentation section there are different documentation pages for specific Sidra connector plugins being released in Sidra.
Sidra Web Management UI includes a list with all configured Data Intake Processes configured in the respective installation environment. For accessing the Data Intake Process list in Sidra Web, you need to access the section Data Intake. A list will display the different configured Data Intake Processes in the system, and a button to Add a new Data Intake Process.
Data Intake Process migrations and limitations¶
The Data Intake Process lifecycle is a big feature that is being developed and released across different Sidra versions.
In Sidra version 2022.R1 (1.11.x), the Data Intake Process has some limitations that will be covered in future versions of Sidra:
For all configured pipelines created before the release of the Data Intake Process (version 1.11.x):
- There is an automated migration process being applied on every installation environment together with the Sidra update process.
- This migration process creates the underlying Data Intake Process objects in Sidra Core metadata database, even if the configured data intake pipelines were not created via a connector plugin, or were created by a connector plugin that is NOT supporting this Data Intake Process concept.
- After this migration, the users can expect to see in the Data Intake Process list a list with the configured data intake in the environment.
- This does not affect however in any means to the normal functioning of the underlying data extraction pipelines which will continue working with no changes. Users are also not expected to perform any changes to their current working pipelines.
- The migration to create these Data Intake Processes will only therefore be available for the list of supported connector plugins in Sidra.
It is important to note that for those existing pipelines that do not have a Sidra plugin support, like customer specific pipelines, there will not be such migration.
This means, that even if there are some data intake pipelines configured, there will NOT be an associated Data Intake Process which can be seen in the Data Intake Process list in Sidra Web UI.
This does not affect the normal functioning of the underlying data intake (pipelines execution and loading of data in Sidra). The created pipelines will NOT be interrupted and will continue working normally.
Currently, it is only possible to create a Data Intake Process associated to a new Provider. It is not possible currently to create a new Data Intake Process associated to an existing Provider.
Nowadays, only users with role Admin are allowed to access this section.
New Data Intake Processes being created with existing Sidra Plugins after Release 2022.R1 (1.11.x) will already register the Data Intake Process in Sidra Core metadata database automatically at plugin execution time.