Skip to content

Connector plugin concept

This page describes some concepts related to Data Intake Processes in Sidra and how to configure them with the help of Sidra connector plugins.

A Data Intake Process in Sidra is an abstraction concept that relates a set of configurations for intaking data from a given data source (e.g., SQL Server database, Sharepoint library, etc.), as well as all the related data extraction infrastructure generated for the data intake (e.g., metadata, trigger, data extraction pipelines). The purpose of the Data Intake Process then is making possible a end-to-end process fully configurable with the Sidra's UI for the data intake, unlike the Data Intake via landing or with indexing which requires previous configurations.

A Sidra plugin, on the other hand, is an internal architectural concept in Sidra to refer to an assembly of code that is installed and executed to connect to a source system. When a plugin encapsulates code to configure a Data Intake Process, it is referred to also as connector plugin, or plugin of type connector.

Plugin approach for Data Intake Processes

For Data Intake Processes to be created from the Web UI, it is required that the underlying data extraction and ingestion pipeline templates as well as code are packaged as a plugin in Sidra.

Plugins are an internal architecture concept in Sidra. Plugins behave as code assemblies that implement a series of interface methods for managing the installation and configuration of data extraction and ingestion elements and pipelines, as well as the creation of the associated metadata in Sidra (e.g., Provider, Entities, Attributes). Such assemblies allow for code for creating and deploying a plugin to be downloaded, installed, and executed from the Web UI.

Sidra incorporates several types of plugins. For plugins to create Data Intake Processes, the plugin type = connector.

A plugin of type connector or plugin for Data Intake Process also packages the configuration parameters so that only a subset of the needed parameters is retrieved from the user.

These parameters will be the inputs to be filled in during the wizard steps.

Common concepts used in Data Intake Process configuration

Type Translations and mappings

One important aspect when integrating data from source systems into Sidra, is the type translations or transformations for incompatible types between the source and the destination.

Sidra incorporates a table in the Core metadata table called TypeTranslations table.

TypeTranslations table

  • When a plugin version is installed, the different type mapping and transformation rules for that source system will be populated in this table.

    You can see the details of this table model in the metadata section.

  • This table contains a series of mapping and transformation rules from sources to sink systems. These sink systems are the internal Sidra destinations along the whole ingestion process.

  • The different rules will be loaded and used to fill certain Attribute metadata fields (HiveType, SQLType), to interpret how to process the fields along the ingestion process.

  • The data extraction pipeline will then also use the type translation rules to convert to Parquet format.

  • An example of a data extraction type transformation rule is for the source type VARBINARY. In this case, the transformation will include the following expression:

    cast('' as xml).value('xs:base64Binary(sql:column("<field>"))', 'varchar(max)')

Load restrictions applied in metadata extraction

Section Configure new data source describes the general conceptual steps about the configuration of a new data source in Sidra.

One of the required steps is to configure and create the metadata structures about the data source (Provider, Entities and Attributes).

In the case of databases, the Data Intake Process wizard, usually incorporates the deployment and execution of a metadata extraction pipeline.

The metadata extraction pipeline reads into the schema of the source databases and creates the needed Entities and Attributes metadata in Sidra Core metadata tables. The information about Entities and Attributes is obtained from that schema. The metadata extraction pipeline also includes as a parameter a list of objects to include or exclude. The set of objects is stored in some Sidra metadata PipelineLoadRestriction tables.

PipelineLoadRestriction are sets of objects to include or exclude from the origin data source, when performing the metadata extraction process.

When using Inclusion mode, the list of load restriction objects will be applied with an inclusion policy (just include the objects in the LoadRestrictionObject tables), or exclusion policy (load all objects, except the objects in the LoadRestrictionObject tables).

How to check pipelines in Data Factory

There is usually an option in the Data Intake Process wizard to force for the automatic execution of the data extraction pipeline right after the creation of the pipelines. In this case, in ADF we will see a pipeline execution for that pipeline. Pipeline executions can be seen in the Monitor ADF section. A filter allows to search for the pipeline and obtain the executions of that pipeline.

  • If the metadata extraction pipeline needs to be re-executed, we could go to the pipeline definition (Author), and launch the trigger. Click on Add trigger > trigger now. A window will appear to pass as parameter the ItemID of the Provider. This ItemID of the Provider is obtained from the pipeline template.

  • If the data extraction pipeline needs to be manually executed or re-executed, we could go to the pipeline definition (Author), and launch the trigger. Click on Add trigger > trigger now. A window will appear to pass as parameter the executionDate (just a date).

Sidra Ideas Portal

Last update: 2022-09-29
Back to top