The generate-transfer-query pipeline generates a transfer query and stores it in Databricks file system (dbfs) so it can be executed later.
The pipeline uses the AutoGeneratedGenerateTransferQuery pipeline template:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
It is not associated with any dataset templates.
It is associated with these activity templates in this specific order:
How does it work¶
The pipeline is executed by the CheckLastUpdated activity of the fileIngestion-databricks pipeline. The parameters of the pipeline are covered with the information that CheckLastUpdated receive from its own pipeline -fileIngestion-databricks- and from CheckDatesAndDropCreate activity in its onw pipeline.
This native If Condition activity checks if the dropAndCreateTable is true, if so it executes the create-table pipeline. The dropAndCreateTable parameter comes from the DropAndCreateORCTableOnChange column in the Entity table, meaning that the system can configure, for each entity, if the table creation is going to be executed or not.
This GenerateTransferQuery custom activity queries the Core database for Entities that have been created or updated since the last execution. For each entity, it creates a python script that contains a transfer query and uploads it to DataBricks File System (dbfs).
This native Stored Procedure activity executes the UpdateLastDeployed procedure in the Core database. The procedure updates the LastDeployed column in the Entity table with the current date.