Skip to content

Schema Evolution for database connectors in Sidra

Database connectors in Sidra support the automatic evolution of the source tables schema whenever new columns are added in said source tables. As a result of this, if the setting to include all tables and objects is set, new Attributes will be created in Sidra Service metadata for any new columns added in the source tables.

To activate this option, you would need to activate an optional setting in the Metadata Extractor options, which, by default, is set to No :

Do you want the metadata to be refreshed automatically with each data extraction?

If set to Yes, the metadata extraction pipeline that is created as part of this Data Intake Process creation will always be run before each execution of the data extraction pipeline. As a result, any new columns in the source database will be detected and incorporated as new Attributes in the respective Sidra Entity.

To achieve this, every time a Data Intake extraction is configured (via the associated Trigger), first, an orchestrator ADF pipeline will be run. This orchestrator ADF pipeline will do the following:

  1. Check if the setting Refresh Schema Automatically is set to Yes, or True.

    • If set to True, when the Trigger is hit, the deployed Metadata Extraction pipeline for that connector and connector version will be executed. This includes the queries to check for the schema of the source system, the creation of Entities and Attributes and the execution of the DSU ingestion script.

    • If set to False, when the Trigger is hit, just the deployed data extraction pipeline for that connector and connector version will be executed.

    • Notifications for the success or failure of either the metadata extraction pipeline or the data extraction pipeline will be sent, so you can see the overall configuration success status. Please see below a depiction of this orchestrator pipeline in ADF:

    Schema Evolution Refresh

  2. Once in each execution of the metadata extraction pipeline, for each Entity, Sidra API endpoint for metadata inference will be called. This will check for any new Attributes added at the source.

Dive deeper

The explanation above is made assuming that the source table is included to be loaded as per the definition of the Object Restriction list. The table will be included if:

  • The setting is include all tables.
  • The setting is include some tables and the table is in the list.
  • The setting is exclude some tables and the table is not in the list.

Apart from creating these new Attributes for added columns in the Sidra Service metadata, also new columns will be created in the respective Databricks tables for that Entity. For this, the Create Tables and DSU Ingestions script will execute again adding the new information.

Once in each execution of the data extraction pipeline, Assets will be created which include the new columns in Databricks.

Versions

From Release 2022.R3 onwards, this feature is supported for Oracle, MySQL, DB2 iSeries and MariaDB connectors.

Removal of columns from source systems

If columns are removed from the source system, there will not be an automated removal of Attributes in Sidra or from the Databricks tables. This scenario is only supported in a manual way. Please contact Support for this manual intervention.