Configuration steps

Configuration steps¶

The process of setting up an SQL Server Data Intake Process involves several key actions. The different sections of the input form are organized into these main steps:

Step 1. Configure Data Intake Process¶

Please see on the common Sidra connector plugins section about the parameters needed to configure a Data Intake Process.

Step 2. Configure Provider¶

Please see on the common Sidra connector plugins section about the parameters needed to create a new Provider.

Step 3. Configure Data Source¶

The data source represents the connection to the source database. A Data Source abstracts the details of creating a Linked Service in Azure Data Factory. The fields required in this section are the Integration Runtime and the connection string to the database. Sidra SQL Server connector plugin will register this new data source in Sidra Metadata and deploy a Linked Service in Azure Data factory with this connection. The created Linked Service will use Key Vault in Sidra Core to store the connection string to the database. Default value for Integration Runtime is AutoResolveIntegrationRuntime.

For more details on Linked Services check the Data Factory documentation.

The information that needs to be provided is the following:

Integration Runtime: this is the ADF integration runtime in charge of executing the connection between the data origin, and is associated with a deployed Linked Service. In case it is not required any IR, the option Default will be selected.
About connection details:
- AuthenticationType: From version 2022.R3, Windows Authentication for SQL server plugin has been included improving the authentication security. This field in Sidra Web allows to choose between the SQL Authentication and Windows Authentication type.
- Server Name: the name of the server to connect to.
- Database Name: the name of the database.
- User Name: user with access to the origin server.
- Use existing secret: When true, the field Password's secret name must be filled, when false, the user Password field must be filled.
- Encrypt: for the connection string (true/false). By default, true is used.
- Trust Server Certificate: for the connection string (true/false). By default, false is used.

Step 4. Configure SQL Metadata Extraction¶

Sidra Azure SQL Database connector plugin replicates the schema and tables from the source database, by querying the INFORMATION_SCHEMA System View tables.

The extracted schema is replicated to the Sidra Metadata model hierarchy. Tables in the SQL Database source system are replicated in Sidra as Entities, and columns of these tables, as Attributes.

The Entity table in Sidra metadata model contains data about the format of the Entity generated, as well as information about how this Entity should be handled by the system. Attributes contain data about the columns of the tables. Sidra adds additional system-level Attributes to convey system-handling information at the level of the Entity (or table).

As part of the metadata extraction of a Data Intake Process setup, the autogenerated transfer query / DSU ingestion will be executed following the selected configuration scenario, transforming the data in raw storage into the optimized format in the Data Lake.

For more details, please check the Data Ingestion documentation.

The metadata extraction represents in which way the origin will be consumed in order to obtain its metadata. The information fields required to fill in the SQL Metadata extraction section will be used for three main purposes internally in Sidra:

Create and populate the Sidra Core metadata structures (Entities and Attributes).
Create, deploy and execute an Azure Data Factory pipeline for the metadata extraction steps (see below steps).
Auto-generate the create tables and executes the autogenerated transfer query / DSU ingestion script, including all the steps for storing the optimized data in the data lake storage.

These are the metadata extractor configuration parameters to be input for this version of connector plugin. All of these parameters are optional:

Number of Tables per Batch: this is used internally in the metadata extractor pipeline in order to specify the number of tables that will be consumed per batch.
Types of Objects to Load: views, tables or both.
Object Restriction Mode: there can be three options here:
- Include all objects: all objects from all databases will be imported.
- Include some objects: to specify a list of objects to include.
- Exclude some objects: to specify a list of objects to exclude.
Object Restriction List: this is a list with comma separated values, which includes those objects that we want or do not want to load into the system, depending on the value of Object Restriction Mode. If no values are specified in this field, all objects will be loaded. To fill in this field, add a list of [<database_name>].[<schema_name>].[<table_name>] or [<database_name>].[<schema_name>] or [<database_name>] elements in plain text, separated by comma. Examples: [AdventureWorks].[Sales_LT], [AdventureWorks].[Orders_LT].[OrderHeader], [EmployeesDB]

Once the metadata extractor pipeline has been created, it is executed. This execution could take a while (from 5 to 15 minutes) depending on the load of Data Factory and the status and load of the associated Databricks cluster. Sidra exposes API endpoints as well to manually execute the SQL Metadata extractor pipeline.

Step 5. Configure Trigger¶

Please see on the common Sidra connector plugins section about the parameters needed to set up a trigger.