The create-table pipeline creates the table and validation error table for the entity of the asset being ingested.
The pipeline uses the AutoGeneratedCreateTable pipeline template:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
It is not associated with any dataset templates.
It is associated with these activity templates in this specific order:
How does it work¶
The pipeline is executed by the CheckDropAndCreateTable activity of the generate-transfer-query pipeline. The parameters of the pipeline are covered with the information that CheckDropAndCreateTable receive from its own pipeline -generate-transfer-query-.
This CreateTableScript custom activity retrieves the information of a entity from the Core Database (to select which entity, it uses the input parameters idProvider and tableName). With that information, the activity generates Hive queries to create the table and the validation error table for the entity. Finally, it stores those queries in a HiveQL script in Databricks File System (dbfs).
This native Databricks Python activity executes in Databricks a Python script that runs the HiveQL script that has been generated in the previous activity. The Python script should be already created and stored in the dbfs.
This native Stored Procedure activity executes the ChangeDropAndCreateTable procedure stored in the Core database. The procedure sets the DropAndCreateORCTableOnChange column to false in the Entity table. That prevents the creation of the table every time that an Asset of the same entity is ingested.