Sync UI for Data Products¶
Information
Before going through this article, it is necessary to have a deployed Data Product. Please, follow the guide to deploy a Data Product.
Data synchronization can be configured through the user interface by clicking on the Sync
button'. This button becomes disabled if, after a Sidra Service update, the Data Product's version becomes incompatible with the current version of Sidra, requiring an upgrade to a supported version (a tooltip will be displayed when hovering over it).
By clicking on it, the Sidra's UI will allow the following actions.
1. Data Sync¶
This section shows the Data Syncs, which means the data synchronization from the DSU to the Data Product. Also, the creation of new Data Syncs is now possible as well as the management of existing Data Syncs, changing:
- The Sync Mode used
- The Entities to be consolidated
- The consolidation mode used
- The definition of the Staging tables
The different edition options are described as follows.
1.1 New Data Sync¶
By the top right button New
, the user now is able to add a new Data Sync with the desired Sync Mode and its parameters as, for example, the Copy Entities to Databricks and SQL Database Sync Mode.
Execution Parameters
field will become visible only when the selected Data Sync includes parameters.
The parameters for using this Data Sync are:
storedProcedureName
: The SQL stored procedure that is going to be invoked after all the Entities have dropped the content into the staging tables. This stored procedure will need to be created before deploying a pipeline from this pipeline template. The default value is[Staging].[Orchestrator]
.orchestratorNotebookPath
: The path of the Notebook to execute the configured queries in theStagingConfiguration
table . The Notebook should be previously created and uploaded into the Databricks instance.
For example, the ExecutionParameters
section will be:
{
"storedProcedureName": "[staging].[Orchestrator]",
"orchestratorNotebookPath": "/Shared/MyNotebook",
}
1.2 Data Sync configuration¶
After clicking on the gear button, a new page for details of configuration is available which is composed of several sections:
-
Details: informative box the chosen Data Sync.
-
Associated Entities
This section allows for verification of Entities linked to the specific Data Sync. It implies that only the chosen Entities will be transferred from the DSU to the Staging tables for this Data Sync.
An Edit button is available to facilitate the configuration of this relationship.
In this screen, DSU, Provider and Entity information is depicted. The
Consolidation Mode
can be modified here as well as including Entities in the Data Sync or mark the fieldMandatory
. When an Entity is included here, the Staging Configuration section is updated with a default definition for the Staging tables for those Entities.Dive deeper in
ConsolidationMode
fieldThe
PipelineExecutionProperties
column within theEntityPipeline
table of Data Products introduces a significant feature for managing how data is integrated into the system, particularly through theConsolidationMode
parameter. This parameter determines the behavior of data loading processes before they start. With options such as Overwrite, Merge (the default setting), and Append, it provides flexibility in how incoming data interacts with existing datasets. See more information here.Dive deeper in
Mandatory
fieldThe Data Sync will not run until there are Assets for all Entities marked as mandatory, even if there are successful loads in the DSU for some of these Entities.
-
Staging Configuration
This section incorporates the table
Sidra.StagingConfiguration
of the Data Product, which is responsible for configuring the definition of the tables in the Staging area in Azure SQL, where the data will be synchronized and will be available to apply any business logic.This is automatically populated with a default definition when an Entity is associated to a Data Sync (see section 2 - Edit button in Associated Entities), but can be extended with custom definitions querying those Entities.
-
Status of Load Processes
This important section is showing the data load information (ADF pipelines status) from Sidra Data Product database and ADF. When an error occurs in the data load, an info icon will be shown with the ADF message on it.
Run End
andDuration
fields will depend on ADF records so, with the time, this information will be no longer shown.
1.3 Reload Entities¶
This option enables the re-loading of data for specific Entities, offering to either refresh data within a specified time range or retrieve all data stored in the DSU for those Entities. Regarding the Start Date
window:
Start Date
will need to be specified for the Entities reload for a specific date range (start date to current date).- If it is not specify, the entire Entity will be reloaded.
- This could be a high-consumption process so a warning will appear informing about scale-up of resources.
1.4 Edit Data Sync¶
The same parameters configured when we create a new Data Sync, can be changed in this section by the button for editing (pencil).
When clicking on Edit
button or when creating a new Data Product, Description
field must be filled.
1.5 Delete Data Sync¶
Upon initiating the deletion process, the Data Sync will no longer be available in ADF, and it will not be visible in the Sidra UI. However, on the Sidra Platform, the deletion is logical, meaning the configuration could potentially be recovered.
Known issues¶
Description
field has a maximum of 256 characters.Title
field has a maximum of 250 characters.- When removing the whole title in
Edit
option, some unexpected behavior can happen returning the original value. - When editing a Data Product to add a new image, image selector will not open.
- When the user name for a Data Product is changed, the image must be loaded again.