Pipeline: generate-transfer-query

The generate-transfer-query pipeline generates a transfer query and stores it in Databricks file system (dbfs) so it can be executed later.

Definition

The pipeline uses the AutoGeneratedGenerateTransferQuery pipeline template:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
{
    "name": "##name##",
    "description": "Generated pipeline for ##name##",
    "properties": {
        "parameters": {
            "providerId": {
                "type": "Int"
            },
            "tableName": {
                "type": "String"
            },
            "providerName": {
                "type": "String"
            },
            "idEntity": {
                "type": "Int"
            },
            "dropAndCreateTable": {
                "type": "Bool"
            }
        },
        "activities": [##Activities##]
    }
}

It is not associated with any dataset templates.

It is associated with these activity templates in this specific order:

  1. CheckDropAndCreateTable
  2. GenerateTransferQueryPythonScript
  3. UpdateLastDeployed

pipeline-generate-transfer-query

How does it work

Pipeline launch

The pipeline is executed by the CheckLastUpdated activity of the fileIngestion-databricks pipeline. The parameters of the pipeline are covered with the information that CheckLastUpdated receive from its own pipeline -fileIngestion-databricks- and from CheckDatesAndDropCreate activity in its onw pipeline.

CheckDropAndCreateTable

This native If Condition activity checks if the dropAndCreateTable is true, if so it executes the create-table pipeline. The dropAndCreateTable parameter comes from the DropAndCreateORCTableOnChange column in the Entity table, meaning that the system can configure, for each entity, if the table creation is going to be executed or not.

GenerateTransferQuery

This GenerateTransferQuery custom activity queries the Core database for Entities that have been created or updated since the last execution. For each entity, it creates a python script that contains a transfer query and uploads it to DataBricks File System (dbfs).

UpdateLastDeployed

This native Stored Procedure activity executes the UpdateLastDeployed procedure in the Core database. The procedure updates the LastDeployed column in the Entity table with the current date.