Supported data synchronization mechanisms¶
The connector plugins for Sidra support different modes of data synchronization, which also depend on the mechanisms configured on the source system or Sidra. These are the following:
-
Full load data synchronization. Generally performed for first time loads. This is also the default mode if no alternative incremental load mechanism is defined. By default, the first load will be a complete load.
-
Incremental load data synchronization. This data synchronization mechanism captures updates for any new or modified data from the source database. Only data corresponding to the updates since the last synchronization are retrieved.
For incremental load to work, there must be a defined mechanism to capture updates in the source system. For incremental load data synchronization, three possible types of mechanisms are supported depending on the database technology:
1. Incremental Load with built-in Change Tracking (CT). This is achieved by directly activating Change Tracking in the source database. Only for Azure SQL and SQL Server connector plugins in Sidra.
2. Incremental Load with built-in System-period temporal tables. Only for DB2 and DB2 iSeries connector plugins.
3. Incremental Load non-Change Tracking (non-CT) enabled by Sidra. This is achieved by specific configurations in the Sidra Metadata tables. For more information about use cases, you can check the
EntityDeltaLoad
table related.
1. Incremental Load with built-in Change Tracking (CT)¶
This incremental load only applies to SQL Server and Azure SQL databases connector plugins in Sidra.
For more information on how to activate Change Tracking in an SQL database, please check the Microsoft Documentation.
Change Tracking records when a row in a table has changed. It does not capture the data that was changed or how many times it changed. Change Tracking requires primary keys defined in the source database, in order to identify which rows have changed.
If CT is enabled on a table in the source system, Sidra Service will use CT as the incremental update mechanism. This is the most efficient mechanism for detecting new, deleted or updated entries in the source database.
Change Tracking creates change records that the data extractor pipeline in Sidra Service accesses on a per-table (Entity) basis during incremental updates.
When Change Tracking is enabled, the data extractor pipeline will load data depending on the last value of LastDeltaValue
. LastDeltaValue
is a value stored in the EntityDeltaLoad table in Sidra Service metadata.
If there is no LastDeltaValue
, the load will be marked as Initial Load
, and if there are new values from the latest LastDeltaValue
, the load will be marked as Incremental Load
.
Additionally, for SQL Server, Sidra Service incorporates an option for signalling when to reload all the changes (CHANGE_TRACKING_MIN_VALID_VERSION
) in SQL to control when Change Tracking can be used to load the changes, and when we need to overwrite the entire table. This is useful for advanced controlling scenarios of expiration of data retention period.
If a table does not have any changes at scheduled data extraction time, an empty Asset (0 bytes) in Sidra Service database is generated in order to specify that the scheduled load was successful but generated an empty set. This is needed to ensure consistency between Sidra Service and Data Products.
The current version of Sidra connector supports advanced schema evolution.
Deleted Rows¶
Sidra supports delete operations in Change Tracking incremental load. The delete support has been incorporated to Transfer Query/DSU ingestion scripts. When a change happens, in SQL Server and Azure SQL databases:
- The column
SYS_CHANGE_OPERATION
acts as a marker for the change (values U -Update-, I -Insert-, and D -Delete-). - Sidra adds a new metadata field
SidraIsDeleted
to the table in the Data Lake (different scenarios can be checked here). - When the column
SYS_CHANGE_OPERATION
on the source has the value "D", columnSidraIsDeleted
on the Data Lake is set totrue
.
Change Tracking expiration¶
Each table using Change Tracking has a minimum valid version. This number represents the version of the database at the moment when the first trackable change in this table was done.
This valid version determines an expiration period of Change Tracking, so that any change older than the minimum valid version cannot be recovered using Change Tracking mechanism.
The minimum version depends on the retention period (by default retention period is 2 days). That means that changes in the source table older than the retention period might not be retrievable.
You can check the minimum valid version of a table using the following T-SQL command:
When a table is truncated, as those changes are not logged by Change Tracking, the minimum version is updated, so changes prior to the truncate cannot be retrieved.
In any of the circumstances, when the minimum valid version is bigger than the last version imported to Sidra, or when the table is truncated, the system needs to reload the entire table and replace the old data, before starting to use Change Tracking again for incremental load.
When the data extraction pipeline detects this situation, the Entity associated to the tables is marked with the flag Need Reloading
in the EntityDeltaLoad table. That is, the NeedReload
field in DataIngestion.EntityDeltaLoad table is set to 1.
By default, this reloading does not happen until the field EnableReload
in the same table is set to 1 by the user. This is done to prevent heavy loads during working hours, which could degrade the source database performance.
There is a way to make this reloading happen automatically, without need for user intervention, and no matter the value of EnableReload
.
Just add the property autoReload
to the AdditionalProperties
json field in DataIngestion.Entity
table:
{
"sourceTableName":"[dbo].[mytable]",
"sourceServer":"myserver.database.windows.net",
"sourceDatabase":"mydatabase",
"sourceSchema":"dbo",
"sourceTable":"mytable",
"consolidationMode": "Merge",
"autoReload": "true"
}
Note that this must be done on a per Entity basis.
2. Incremental Load with built-in System-period temporal tables¶
This sync mechanism only applies to DB2 and DB2 iSeries databases connector plugins in Sidra.
For more information on how to activate System-period temporal tables in a DB2 and DB2 iSeries databases, please check the IBM Documentation:
Deleted Rows¶
Sidra supports delete operations for system-period temporal tables. The delete support has been incorporated to Transfer Query/DSU ingestion scripts. When a change happens, in DB2 and DB2 iSeries databases:
- System-period temporal tables store the information of the changes.
- The column
SYS_START
will be updated with the date of the last operation performed. -
The
OP_CODE
column act as a marker for the change with the following values:- U: Update.
- I: Insert.
- D: Delete.
-
Sidra adds a new metadata field
SidraIsDeleted
to the table in the Data Lake (different scenarios can be checked here). - When the column
OP_CODE
on the source has the value "D", columnSidraIsDeleted
on the Data Lake is set totrue
.
3. Incremental Load non-Change Tracking (non-CT) enabled by Sidra¶
This is an option to use an incremental query based in parametrization which applies for all the Connector Plugins in Sidra. This parametrization is provided by the IdDeltaAttribute
, which is the Attribute that is used for the incremental extraction. This Attribute acts as a unique key to identify new inserts or updates in the source database. Sidra generates a query that filters by searching the differences according to this column. This mechanism is not as efficient as the built-in Change Tracking mechanism though.
This Attribute is described in the EntityDeltaLoad table in Sidra Service as well as several examples of use.
Classification of Sidra's connector plugins per sync mechanisms¶
Sync mechanism | Azure SQL | SQL Server | DB2 | DB2 iSeries | Oracle | MySQL |
---|---|---|---|---|---|---|
Full Data Sync | Yes | Yes | Yes | Yes | Yes | Yes |
Incremental Load with Built-in Change Tracking (CT) | Yes | Yes | - | - | - | - |
Incremental Load with Built-in System-period temporal tables | - | - | Yes | Yes | - | - |
Incremental Load non-Change Tracking (non-CT) by Sidra | Yes | Yes | Yes | Yes | Yes | Yes |