Types and transformations¶
Some of the original DB2 data types are not supported by Azure Data Factory conversions to Parquet (the format of storage in Sidra DSU).
For those data types that are not supported, Sidra connectors as plugins incorporate some type translation mechanisms in place, used both at metadata extraction and at data extraction phases.
If a certain data type is not supported, that data type is automatically changed to the closest supported type as defined in a Type Translations table.
You can find more information about the general process for type translations for plugins is in this page.
Data Extraction pipeline¶
Once with all the information provided in the Configuration steps section, Sidra Core will create and deploy the actual data extractor pipeline. The data extraction pipeline is where the actual movement and transformation of data happens:
- On one hand, the
copy dataADF activities are executed, which actually move the data between the source (DB2 Database) and the destination (Azure Data Lake Gen2).
- On the other hand, the DSU ingestion script is executed in order to perform data optimization, data validation, etc., and loading the data in its optimized format in the data lake.
The time to execute this pipeline is variable depending on the volume of data and the environment.
Initial full data synchronization¶
Once Sidra is connected to the source database, Sidra DB2 Database connector plugin first copies all rows from every table in every schema and table that has not been explicitly set to be excluded.
For each table (Entity), rows are copied by performing a SELECT statement. Copy Activity in Azure Data Factory parallelizes reads and writes according to the source and destination.
Loading incremental data mechanisms¶
Once an initial synchronization is complete, Sidra performs incremental synchronizations on the new and modified data in the source system.