Data Catalog¶
From the Data Catalog view, users can navigate or search through all the Providers, Entities and Attributes which are visible given the permissions in place. Sidra Data Catalog provides data management capabilities from a user-friendly interface.
Sidra Data Catalog is based on the Sidra metadata hierarchy model. Please refer to this page for details about the different objects of the Sidra metadata model.
The functionality of Sidra Data Catalog includes the following main functionalities:
- Visualize which data Assets (Providers, Entities and Attributes) are available in the platform.
- Visualize key indicators and figures about these data Assets, like total volumes or Attribute popularity. Some of these indicators at an aggregate level are also available in Sidra Web Dashboard.
- Visualize and search among these Assets by using different Attribute filters (name, owner, ...).
- Preview data in a secure and compliant way.
- Document the different Assets by means of enriched metadata, like adding enriched descriptions or assigning tags to the Data Catalog items.
It is important to note that this Data Catalog and all its underlying metadata is accessible through a secure API. This allows to easily integrate with other external data governance and data management tools outside of Sidra. Sidra has been integrated in the past with other tools (Alteryx Connect) by using this mechanism.
Below are the main sections in the Data Catalog web interface.
Main sections in Data Catalog¶
Data Catalog¶
This page acts as the dashboard for the Data Catalog page showing a Providers view. It includes a bar chart visualization of the Providers sorted by ingested size into the platform along with their representation in cards. These correspond to all the Providers configured in Sidra for data ingestion.
The card view allows an icon-based navigation experience. Each Provider card includes the key details of a Provider, and allows to drill-through that Provider to either see the Provider detail page, or see the list of Entities that are associated with that Provider.
When clicking on the card, a Providers detailed page is shown, also allowing to change the layout of the Providers to a card or list view (icons on the top right corner of the page). Each element in the Providers list view also supports the action menu to drill-through the Provider details and the Entities.
Azure AI Search for Global Search¶
Sidra supports the use of Azure AI Search to provide indexing and searching capabilities on documents and other sources of data. With this service, Sidra is taking advantage of quick indexing for large amounts of files through metadata indexing for Attributes, Entities and Providers.
The Data Catalog Dashboard allows an advanced search by Azure AI Search services. When using the top box of "Search", a quick metadata retrieval shows up. In the right top corner, you can use the options for filtering and sorting as well as filter by metadata Type, Tags and DSU (center box options). Note that, when filtering by Tags, will only appear those whose the user has access to, due to a security mechanism. For the search box, the minimum number of characters is three.
Filter action allows to filter by different and composed criteria:
Provider detail page¶
The Provider detail page displays the main metadata elements corresponding to a Provider structure inside Sidra metadata, namely:
- Name
- Description
- Owner
- Provider image
- General info
- Total size
- Entities
- Creation date
- Tags
The Provider detail page also displays a list including all the Entities that are associated to the Provider.
The Provider editor allows the user to document the Providers in depth.
Most of the interface real state is dedicated to the documentation editor, whose field General Info is implemented with an embedded markdown that allows the user to generate rich format documents with links to external documentation, etc.
In addition to the documentation editor with real time preview and optional full screen mode, there is also support for editing the Provider image, tags, etc:
Entity detail page¶
The Entity detail page displays the main metadata elements corresponding to an Entity structure inside Sidra metadata, in a similar way to how Provider details are shown in the Provider detail page.
Entity metadata can also be populated from the UI, including support for Entity documentation, tags, and a short description.
In addition to that, the Entity editor shows the Attributes along with the Attribute popularity, which is a measure of how often the specific Attribute is retrieved by the Data Products in relation to the rest of the Entity's Attributes.
Data Quality Validations section¶
A new section has been included to perform data quality validations over the intake data. This section is available only after installing the Data Quality Service.
See more detailed information about how to configure the data quality validation rules in Sidra Web through the documentation.
Attribute detail page¶
Same as the Entity detail page, an Attribute detail page has been added detailing the different values for the field configured of each Attribute.
Data Preview¶
The data preview section inside an Attribute detail page displays a sample of the ingested data into the platform. By default, all Entities have data preview enabled.
Since some Entities might have sensitive data that cannot be exposed to all of the Data Catalog users, specific Attributes can be masked, so that all the users that don't have (without) the required permissions will only see the masked version. Admin users will be able to see the full data without masking, but other user roles with no access to masked data will see the sensitive Attributes masked as per the data masking defined in the Dynamic Data Masking configuration in Azure portal.
By default, and similarly to the Attribute pane described above, the system only shows data for the business Attributes, hiding all internal Sidra ones. If the user wants to check a data preview for system columns, the "Show system attributes" option at the top of the page can be used to enable them.
Data Masking configuration¶
Sidra Web Data Catalog provides a data masking functionality in order to control who can see unmasked data in the Data Preview module inside the Entity detail view page. By means of data masking, we ensure that only authorized users (who have been assigned the Admin
or MaskedDataReader
role) will be able to see the data in clear from the Data Preview tables.
The executed data masking is provided by the native Dynamic Data Masking functionality by SQL Server. This feature incorporates several masking functions according to the origin data (e.g. email, credit card, custom). The masking is applied on the tables under the schema DataPreview
, which are created in the SQL Sidra Core database during the execution of the data ingestion processes (legacy transfer query / DSU ingestion script).
Sidra Core implements in its metadata system support for defining, at Attribute level, which data masking function to implement. The Attribute column to be used for such configuration is called DataMask
.
For example, when defining a data masking rule with the custom function, over an Attribute Name
, the DataMask
field of such Attribute will need to be updated with this information:
According to the Microsoft SQL Server implementation, we would apply a masking for all the characters except the first character of this String type field.