Sidra DW Reports¶
Sidra DW provides several reports in Power BI to control the data intake performance. Information related to the pipelines and activities executions can be analysed.
The reports allow us to achieve an easier interpretation of the information collected and consolidated from different data sources like the core and log databases.
Power BI Reports¶
A description of the each report available in the Power BI Service and its navigation are described below:
Data Intake Report:¶
This report displays, for the selected time period in the slicer, different graphs and indicator cards regarding Data Intake volumes:
- Total volume of data stored in MB
- Percentage of validation errors
- Number of active Data Storage Units, Providers, Entities and applications
- Total ingested Assets
- Total number of rows on the Assets
The last date when all the values in the report were updated is also displayed.
Pipeline Executions Report:¶
This report shows all information related with the pipelines executions and their performance during the selected time period:
- Total amount of pipeline executions
- Percentage of failed executions
- Total duration of the pipelines (seconds)
- Total duration of the activities (seconds)
In addition to these indicators, there is a pipeline executions table, which provides information about the total number of executions, total number of executions failed and duration in seconds for each pipeline.
There is also a line chart visualization depicting the evolution of the pipeline executions and the total pipeline duration compared to the average duration during the period in seconds.
Selecting one pipeline from the list and clicking on the 'See Pipeline Executions Detail' button, a drill through is possible in any specific pipeline, in order to see more detailed information about the pipeline executions.
Detailed Pipeline Executions Report¶
This Detailed Pipeline Executions view gives information about the number of executions of the pipeline by date and the duration of the pipeline executions compared to the activities duration by date.
This dashboard contains also a Pipeline Run List with each execution of the pipeline, date and time execution, duration and any error message in case of failure.
Selecting one of the pipeline run executions from the list and clicking on the 'See Run Detail' button another report is opened with deeper information for that selected execution.
Detailed information about a particular pipeline execution¶
This report shows all the activities and Assets related with a specific execution.
This report presents information regarding the Providers. For the selected period of time, the following information is displayed:
- Total stored volume in MB
- Percentage of validation errors
- Number of active Entities
- Total number of Assets
- Total number of rows on the Assets
On the left-hand side the number of Entities by Provider is displayed. In the middle there are three charts representing the number of assets, the stored volume and the total rows compared to the validation errors by date. The table on the right presents the volume and percentage of errors for each Entity.
More specific aspects of a particular entity can be obtained selecting one of the entities on the list.
The Entity detail report conveys the information for the selected entity. Information about the assets and attributes related with the entity can be checked.
Data Storage Units Report¶
This report shows in a map the location of the Data Storage Units.
Stored Volume Analysis Report¶
This last report consists on a decomposition tree. This tree allows to analyse the stored volume of data by different hierarchical categories.
In the image below the stored volume is broken down by region, Data Storage Unit, Provider, Entity and Asset.
DW Data Model¶
All Power BI reports, and possible new ones if wanted or needed, are created based on the data model presented below.
All dimensions and fact tables and their relationships are shown.
To see it a little bit more clearly, the following pictures show the relations between each of the fact tables and the different dimensions.
A simple star schema was created between the fact tables and the dimension tables.
Pipeline Executions Schema¶
For pipeline executions, we have in this model the relevant tables:
The first one contains facts data about the executions of the pipelines, and the second one is a dimensions table withe the possible errors in the system.
Both tables are loaded from a Load process from the core Logs table, more specifically, the tables
You can find more information about these source tables to generate this Power BI model here>
Util.Logtable is described here.
DataFactoryActivityRunare described here.
Activity Executions Schema¶
Pipeline Assets Schema¶
Dimensions and Measures¶
In this section all the attributes available in the different dimensions and the measures that could be used to create reports in the Power BI service are described.
Brief description of all the dimensions and attributes available in Power BI.
|Activity Status||Activity Status||Activity status|
|Activity Status||Activity Status Value||Amount of activities in each different status.|
|App||Description||Brief description of the application.|
|App||Owner||Owner of the application.|
|App||Resource Group||Resource Group of the application.|
|App||Suscription||Subscription of the application.|
|Asset||Asset Date||Date of ingestion of the asset.|
|Asset Status||Asset Status||Asset status.|
|Attribute||Attribute Name||Attribute name.|
|Attribute||IsMetadata||Whether the attribute is metadata or not.|
|Data Factory||Data Factory||Data Factory name.|
|Data Factory||Resource Group||Resource group name.|
|Data Storage Unit||Data Storage Unit||Data storage unit name.|
|Data Storage Unit||Latitude||Data storage unit latitude.|
|Data Storage Unit||Location||Data storage unit location.|
|Data Storage Unit||Longitude||Data storage unit longitude.|
|Data Storage Unit||Region||Data storage unit location.|
|Date||Month Name||Month name.|
|Date||Month Short Name||Month short name.|
|Date||Year Month||Year and month.|
|Date||Year Month Short Name||Year and month short name.|
|Date||Year Quarter||Year and quarter.|
|Entity||Entity||Name of the entity.|
|Entity||Table Name||Table name of the entity.|
|Errors||Failure Type||Type of failure.|
|Errors||Message||Description of the error message.|
|Last Updated Log||Last Data Updated||Last date data was updated in the DW.|
|Pipeline||Last Deployed||Last date the pipeline was deployed.|
|Pipeline||Selected Pipeline||Returns the actual pipeline selected in the report (use for drillthrough purposes).|
|Pipeline RunId||RunId||Unique identifier of each pipeline execution.|
|Pipeline RunId||Selected RunId||Returns the current selected pipeline execution unique identifier (use for drillthrough purpuses).|
|Pipeline Status||IsFailed||Whether the pipeline has failed or not.|
|Pipeline Status||Pipeline Status||Pipeline status.|
|Pipeline Status||Pipeline Status Label||Pipeline status label.|
|Pipeline Status||Pipeline Status Value||Pipeline status value.|
|Provider||Selected Provider||Returns the actual selected provider.|
|Time||Time||Hour, minutes and seconds.|
|Time Intelligence||Time Calc||Time intelligence calcultations.|
Measures are grouped in different groups inside Power BI. For each group a table with the name and description of each measure is presented.
Activity Execution measures group:
|Activity Duration (Seconds)||Activity duration in seconds.|
|Activity Executions||Number of executions of the activity.|
|Activity Last Execution||Last date the activity has been executed.|
|Activity Last Execution Duration||Duration of the activity in its last execution.|
Assets measures group:
|%Validation Errors||Percentage of validation errors with respect to the total rows.|
|Active Apps||Number of applications with assets within the selected period.|
|Active Data Storage Units||Number of active data storage units with assets within the selected period.|
|Active Entities||Number of active entities with assets within the selected period.|
|Active Providers||Number of active providers with assets within the selected period.|
|Last Date||Last date when the asset has been ingested.|
|Stored Volume (MB)||Stored volume of the asset in MB.|
|Total Assets||Total number of assets.|
|Total Providers||Total number of providers.|
|Total Rows||Total number of rows of the asset.|
|Validation Errors||Total number of validation error for the asset.|
Attribute Popularity measures group:
|Num Times||Number of times the attribute has been populated.|
|Total Attributes||Total number of attributes.|
Pipeline Assets measures group:
|Num Assets||Number of assets related with the pipeline.|
Pipeline Execution measures group:
|Average Pipeline Duration||Average pipeline duration.|
|Last PipelineRunId||Unique identifier of the last execution of the pipeline.|
|Pipeline Duration (Seconds)||Pipeline duration in seconds.|
|Pipeline Executions||Number of pipeline executions.|
|Pipeline Last Execution||Last execution date of the pipeline.|
|Pipeline Last Execution Duration||Duration of the last execution of the pipeline.|
|Total Pipelines||Total number of pipelines.|
Datawarehouse - DW Database¶
In order to create the data model presented before, information regarding the data ingestion process is collected from different sources and consolidated in a data warehouse. In a first stage data mainly from Core and Log databases are extracted and loaded in several tables in a schema called Staging inside a database called DW. The table below contains the list of external (when possible) tables created and their data sources.
- (External) Staging tables and data sources:
|Table Name||Description||Source database||Source table|
|[Staging].[AssetStatus]||All possible states of an asset.||Core||DataIngestion.AssetStatus|
|[Staging].[AttributePopularity]||Attributes popularity information.||Core||DataIngestion.AttributePopularity|
|[Staging].[DataFactoryActivityRun]||Activities execution information.||Log||Util.DataFactoryActivityRun|
|[Staging].[DataFactoryPipelineRun]||Pipelines executions information.||Log||Util.DataFactoryPipelineRun|
|[Staging].[DSU]||Data Storage Units information.||Core||DataIngestion.Datalake|
|[Staging].[PipelineExecution]||Relation between Assets and Pipelines information.||Core||DataIngestion.PipelineExecution|
Based on these staging tables a process of transformation and load is executed. This process is accomplished by executing the following stored procedures. These store procedures load all the dimension and fact tables in the DW schema. The 'OrchestratorLoadDW' stored procedure is the one in charge of executing the rest in the correct order:
There is another schema called 'Log'. Tables and stored procedures belonging to this schema provide a way to register and control the load process into the DW database. The schema contains these two stored procedures: