About the Azure SQL database connector plugin¶
The Azure SQL database connector plugin for Sidra enables seamless integration with the most widely used SQL Server database as a Service on Azure; an intelligent, scalable, relational database service built in for the cloud. Azure SQL Database uses the latest SQL Server capabilities, and you can learn more about it on Microsoft Documentation.
Sidra connector plugin for Azure SQL database extracts data from any table and view in the source database and loads it into the specified Data Storage Unit at regular intervals. It relies on the Sidra Metadata model for mapping source data structures to Sidra as destination, and uses Azure Data Factory as underlying data integration mechanism within Sidra.
When configuring and executing this plugin to create a new Data Intake Process, several underlying steps are involved to achieve the following:
- The necessary metadata and data governance structures are created and populated in Sidra.
- The actual data integration infrastructure (ADF Pipeline) is created, configured and deployed.
The wizard is completed in less than five-minutes. Once the settings are configured and the deployment process is started, the actual duration of the data ingestion may vary from few minutes to few hours, depending on the data volumes.
After starting the Data Intake Process creation, users will receive a message that the process has started and will continue in the background. Users will be able to navigate through Sidra Web as usual while this process happens.
Once the whole deployment process is finished, users will receive a notification in Sidra Web Notifications widget. If this process went successfully, the new data structures (new Entities) will appear in the Data Catalog automatically, and the Data Intake Process will incorporate this new data source.
SQL data types and transformations¶
Some of the original SQL Server data types are not supported by Azure Data Factory conversions to Parquet (the format of storage in Sidra DSU).
For those data types that are not supported, Sidra connectors as plugins incorporate some type translation mechanisms in place, used both at metadata extraction and at data extraction phases.
If a certain data type is not supported, that data type is automatically changed to the closest supported type as defined in a Type Translations table.
You can find more information about the general process for type translations for plugins is in this page.
How to exclude source data¶
Sidra data connector plugin for Azure SQL Database allows to control different settings on which tables will be part of the data to be synchronized between source and destination:
-
List of tables to include
: this field allows to specify whether only original source tables, views or both will be part of the data synchronization. Possible values aretable
andview
. These values correspond to the table_type values in INFORMATION_SCHEMA.TABLES SQL Server System View. If not specified, all table types will be included. -
List of tables to exclude
: this field allows to specify certain schemas and tables in the source database that will not be part of the data synchronization. You can check the list of schemas and tables in the INFORMATION_SCHEMA.TABLES SQL Server System View. If no values are specified in this field, no tables will be excluded. To fill in this field, add a list of [].[ ] elements, separated by comma. Examples:
[dbo].[VariantTable], [Sales_LT].[SalesOrderHeader]
.
Data Extraction pipeline¶
Once with all the information provided in the above steps, Sidra Core will create and deploy the actual data extractor pipeline. The data extraction pipeline is where the actual movement and transformation of data happens:
- On one hand, the
copy data
ADF activities are executed, which actually move the data between the source (Azure SQL Database) and the destination (Azure Data Lake Gen2). - On the other hand, the autogenerated transfer query (legacy) / DSU ingestion is executed in order to perform data optimization, data validation, etc and loading the data in its optimized format in the data lake.
The time to execute this pipeline is variable depending on the volume of data and the environment.
Initial full data synchronization¶
Once Sidra is connected to the source database, Sidra Azure SQL Database connector plugin first copies all rows from every table in every schema and table that has not been explicitly set to be excluded (see above on "Excluding Source Data").
For each table (Entity), rows are copied by performing a SELECT statement. Copy Activity in Azure Data Factory parallelizes reads and writes according to the source and destination. A good practice when using Change Tracking, is that the field by which the changes are calculated is indexed in the source database. This avoids needing to read the whole table.
Loading incremental data mechanisms¶
Once an initial synchronization is complete, Sidra performs incremental synchronizations on the new and modified data in the source system.
Sidra Connector plugin for Azure SQL Database uses the incremental data synchronization mechanisms.