Pipeline: create-table

The create-table pipeline creates the table and validation error table for the entity of the asset being ingested.

Definition

The pipeline uses the AutoGeneratedCreateTable pipeline template:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
{
    "name": "##name##",
    "description": "Generated pipeline for ##name##",
    "properties": {
        "parameters": {
            "providerId": {
                "type": "Int"
            },
            "tableName": {
                "type": "String"
            },
            "providerName": {
                "type": "String"
            },
            "idEntity": {
                "type": "Int"
            }
        },
        "activities": [##Activities##]
    }
}

It is not associated with any dataset templates.

It is associated with these activity templates in this specific order:

  1. CreateTableCustom
  2. RunDatabricksNotebook
  3. ChangeDropAndCreateTable

pipeline-create-table

How does it work

Pipeline launch

The pipeline is executed by the CheckDropAndCreateTable activity of the generate-transfer-query pipeline. The parameters of the pipeline are covered with the information that CheckDropAndCreateTable receive from its own pipeline -generate-transfer-query-.

CreateTableScript

This CreateTableScript custom activity retrieves the information of a entity from the Core Database (to select which entity, it uses the input parameters idProvider and tableName). With that information, the activity generates Hive queries to create the table and the validation error table for the entity. Finally, it stores those queries in a HiveQL script in Databricks File System (dbfs).

RunHQLCreateTable

This native Databricks Python activity executes in Databricks a Python script that runs the HiveQL script that has been generated in the previous activity. The Python script should be already created and stored in the dbfs.

UpdateDropAndCreateTableToFalse

This native Stored Procedure activity executes the ChangeDropAndCreateTable procedure stored in the Core database. The procedure sets the DropAndCreateORCTableOnChange column to false in the Entity table. That prevents the creation of the table every time that an Asset of the same entity is ingested.