Supported data synchronization mechanisms¶
The connector plugins for Sidra support different modes of data synchronization, which also depend on the mechanisms configured on the source system or Sidra. These are the following:
Full load data synchronization. Generally performed for first time loads. This is also the default mode if no Change Tracking is enabled in the source system, nor alternative incremental load mechanism is defined. By default, the first load will be a complete load.
Incremental load data synchronization. This data synchronization mechanism captures updates for any new or modified data from the source database. Only data corresponding to the updates since the last synchronization is retrieved.
For incremental load to work, there must be a defined mechanism to capture updates in the source system. For incremental load data synchronization, two possible types of mechanisms are supported:
1. Incremental Load with built-in Change Tracking (CT). This is achieved by directly activating Change Tracking in the source database. Only for Azure SQL and SQL Server connector plugins in Sidra.
2. Incremental Load non-Change Tracking (non-CT) enabled by Sidra. This is achieved by specific configurations in the Sidra Metadata tables and includes a Non-Change Tracking custom mechanism enabled by Sidra option. For more information about use cases, you can check the
1. Incremental Load with built-in Change Tracking (CT)¶
This incremental load only applies to SQL Server and Azure SQL databases connector plugins in Sidra. For more information on how to activate Change Tracking in an SQL database, please check the Microsoft Documentation.
Change Tracking records when a row in a table has changed. It does not capture the data that was changed or how many times it changed. Change Tracking requires primary keys defined in the source database, in order to identify which rows have changed.
If CT is enabled on a table in the source system, Sidra Core will use CT as the incremental update mechanism. This is the most efficient mechanism for detecting new, deleted or updated entries in the source database.
Change Tracking creates change records that the data extractor pipeline in Sidra Core accesses on a per-table (Entity) basis during incremental updates.
When Change Tracking is enabled, the data extractor pipeline will load data depending on the last value of
LastDeltaValue is a value stored in the EntityDeltaLoad table in Sidra Core metadata.
If there is no
LastDeltaValue, the load will be marked as
Initial Load, and if there are new values from the latest
LastDeltaValue, the load will be marked as
Additionally, Sidra Core incorporates an option for signalling when to reload all the changes (
CHANGE_TRACKING_MIN_VALID_VERSION) in SQL to control when Change Tracking can be used to load the changes, and when we need to overwrite the entire table. This is useful for advanced controlling scenarios of expiration of data retention period.
If a table does not have any changes at scheduled data extraction time, an empty Asset (0 bytes) in Sidra Core database is generated in order to specify that the scheduled load was successful but generated an empty set. This is needed to ensure consistency between Sidra Core and Client Applications.
The current version of Sidra connector supports advanced schema evolution.
Sidra supports delete operations in Change Tracking incremental load. The delete support has been incorporated to Transfer Query/DSU ingestion scripts.
When a change happens, in SQL Server and Azure SQL databases, the column
SYS_CHANGE_OPERATION acts as a marker for the change (values U -Update-, I -Insert-, and D -Delete-). In the case that the value is
D, Sidra will change the column adding a metadata field called
Change Tracking expiration¶
Each table using Change Tracking has a minimum valid version. This number represents the version of the database at the moment when the first trackable change in this table was done.
This valid version determines an expiration period of Change Tracking, so that any change older than the minimum valid version cannot be recovered using Change Tracking mechanism.
The minimum version depends on the retention period (by default retention period is 2 days). That means that changes in the source table older than the retention period might not be retrievable.
You can check the minimum valid version of a table using the following T-SQL command:
When a table is truncated, as those changes are not logged by Change Tracking, the minimum version is updated, so changes prior to the truncate cannot be retrieved.
In any of the circumstances, when the minimum valid version is bigger than the last version imported to Sidra, or when the table is truncated, the system needs to reload the entire table and replace the old data, before starting to use Change Tracking again for incremental load.
When the data extraction pipeline detects this situation, the Entity associated to the tables is marked with the flag
Need Reloading in the EntityDeltaLoad table. That is, the
NeedReload field in DataIngestion.EntityDeltaLoad table is set to 1.
By default, this reloading does not happen until the field
EnableReload in the same table is set to 1 by the user. This is done to prevent heavy loads during working hours, which could degrade the source database performance.
There is a way to make this reloading happen automatically, without need for user intervention, and no matter the value of
Just add the property
autoReload to the
AdditionalProperties json field in
Note that this must be done on a per Entity basis.
2. Incremental Load non-Change Tracking (non-CT) enabled by Sidra¶
This is an option to use an incremental query based in parametrization which applies for all the plugin connectors in Sidra. This parametrization is provided by the
IdDeltaAttribute, which is the Attribute that is used for the incremental extraction.
This Attribute acts as a unique key to identify new inserts or updates in the source database. Sidra generates a query that filters by searching the differences according to this column. This mechanism is not as efficient as the built-in Change Tracking mechanism though.
This Attribute is described in the EntityDeltaLoad table in Sidra Core as well as several examples of use.
Incremental Load non-Change Tracking custom enabled by Sidra¶
IdDeltaAttribute is not present in the Entity, Sidra Core uses two additional columns, also in the EntityDeltaLoad table in Sidra Core.
CustomGetMaxValueExpression: This value contains the custom expression used to get the maximum value for incremental load.
CustomExtractDataExpression: This value contains the custom expression used to query the source table to get the incremental data. This allows to code more complex expressions than just using an
These columns act as a unique key to identify new inserts or updates in the source database. Sidra generates a query that filters by searching the differences according to these columns. This mechanism is not as efficient as the built-in Change Tracking mechanism neither.
Classification of Sidra's connector plugins per sync mechanisms¶
|Sync mechanism||Azure SQL||SQL Server||SharePoint||DB2||Oracle||MySQL|
|Full Data Sync|
|Incremental Load with Built-in Change Tracking (CT)||Coming soon||Coming soon||Coming soon||Coming soon|
|Incremental Load non-Change Tracking (non-CT) by Sidra|