Data Product for Business Intelligence¶
Business Intelligence applications helps you understand trends and deriving insights from your data so that you can make tactical and strategic business decisions.
In Sidra Data Platform, the Data Product for Business Intelligence can be defined as a set of methodologies and processes that allow to extract, transform and load data from the Data Lake (DSU) and exploit it for its analysis and conversion into knowledge, thus giving support to decision making about the business.
The Data Product for Business Intelligence (BI) allows to analyze digital data and visualize it in reports, summaries, dashboards, etc.
The Data Product with this template needs to be configured to have the required permissions to access the DSU data.
Once that is happening, the Data Product Sync job, explained here, is responsible to transparently synchronize the metadata between Sidra Service and the Data Product database. This job also triggers the Data Product pipeline defined to synchronize the actual data to the local Databricks cluster and to create the Staging tables.
The actual data flow orchestration is performed by a Data Product pipeline, via a specific instance of Azure Data Factory installed in the Data Product resource group.
High-level installation details¶
As with any other type of Data Product in Sidra, the process of installing this Data Product consists of the following main steps:
- A dotnet template is installed, which launches a build and release pipeline in Azure DevOps defined for this Data Product.
- As part of the build and release pipeline for this Data Product, the needed infrastructure is installed.
For more information on these topics you can access this Documentation.
Create a Data Product from scratch
For more information, check the specific tutorial for creating a Data Product .
The resource group will contain all the services used by the Data Product, separated from the Sidra Service and DSU resource groups. The services included in the ARM template for this Data Product contain the following pieces:
- Client storage account for raw data: used for storing the copy of the data that is extracted from the DSU, and for which the Data Product has access.
- Client Data Factory: used for data orchestration pipelines to bring the data from the DSU, execute the Databricks orchestrator notebook, and copying the ellaborated data to the staging tables.
- Client Database: used for keeping a synchronized copy of the Assets metadata between Sidra Service and the Data Product, and for hosting the relational models and transformation queries and stored procedures.
- Client Key Vault: used for storing and accessing secrets in a secure way.
Besides the Azure infrastructure, several Webjobs are also deployed for the BI Data Product, responsible for the background tasks of data and metadata synchronization:
You can find more information about these jobs in this page.