How to configure tables to exclude from metadata extraction with SQL database

This short tutorial describes how to configure the exclusion of certain tables from being ingested in the Data Storage Unit.

SQL Metadata extractor

SQL Metadata extractor template is a pipeline template used to configure the metadata extraction for an SQL source.

The SQL Metadata extractor pipeline includes a parameter "excludedTables", which can be populated in order to add a list with the tables that the user wants to be excluded from the data extraction process.

The method to configure the excluded tables depend on whether a new SQL metadata extractor pipeline is created, or an existing pipeline needs to be updated:

Using the Sidra API to create an new instance of the metadata extractor pipeline

If using the API for SQL metadata inference: /api/metadata/providers/{id}/pipelines/sqlmetadataextractor, there is a parameter called excludedTables. This parameter can be populated with a list including all the tables, separated by commas, that need to be excluded from the data extraction.

This is a plain list, with elements separated by commas (use schema_name.table_name), e.g.: `excludedTables: 'audit.Group','SalesLT.Customer','SalesLT.CustomerAddress','SalesLT.Product','SalesLT.ProductCategory','SalesLT.ProductDescription','SalesLT.ProductModel'.

By using this parameter, the SQL Metadata extraction pipeline will be created and deployed with these settings, so that no metadata about the tables included in the field excludedTables is added in Sidra Core. Data extractor pipeline, when being executed, will therefore not extract any data about such tables.

Updating a metadatga extractor pipeline by populating parameters

If the metadata extractor pipeline has already been generated, and you need to update it to exclude some tables, then you need to update the metadata in Sidra Core table, and then re-create the metadata extractor pipeline.

In order ot do this, you need to populate the table [DataIngestion].[Pipeline] with the settings on what tables to exclude.

There is a property called excludedTables in the parameters field. This parameters field is a JSON with parameters associated to the pipeline. The field excludedTables is also a list of tables separated by commas (use schema_name.table_name), similarly to the previous version:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
{
    "ProviderItemId": "0775CAE7-9B73-4555-9B25-B314E1665250",
    "folder": "bug105740",
    "sourceDatabaseDatasetName": "adventureworksSourceSql",
    "sourceDatabaseLinkedServiceName": "AdventureWorksAzureSqlDatabase",
    "sourceDatabaseDatasetSourceType": "AzureSqlSource",
    "sourceDatabaseDatasetType": "AzureSqlTable",
    "lookupSqlDatasetName": "adventureworksSourceSql",
    "tablesBatch": "100",
    "getSecretsRetryCount": "5",
    "getSecretsRetryIntervalInSeconds": "60",
    "CreateArtifactsInBulkPipelineName": "CreateTransferQueriesAndTablesInBulk",
    "CreateArtifactsInBulkPipelineReferenceName": "CreateTransferQueriesAndTablesInBulk",
   "tableTypes": "'BASE TABLE','VIEW'",
    "excludedTables": "'audit.Group','SalesLT.Customer','SalesLT.CustomerAddress','SalesLT.Product','SalesLT.ProductCategory','SalesLT.ProductDescription','SalesLT.ProductModel'.
"
}