Skip to content

How to add a new Attribute

An Attribute in Sidra represents one of the fields or columns of an Entity.

So, after the creation of a new Entity, the next step is to create new Attributes for that Entity, one for each of the fields/columns of the Asset that is represented by the Entity.

If the Entity is not created yet, this tutorial How to add a new Entity can be used to create it.

Attributes provide details about the structure of the Asset and they will be used, along with Provider and Entity, to generate pipelines, transfer queries, data lake table creations, etc.

Attributes are stored in the Attribute table in the Sidra Core metadata database.

As it happens with other members of the metadata model, there are two ways to create a new Attribute:

  • Add an Attribute by creating an SQL script to insert the new Attribute in the core metadata database.
  • Add an Attribute by using the Sidra API endpoint to add new Attributes.

Information

Take into account that Sidra metadata tables do not allow for space characters in the names of Attributes or Entities. From version 2022.R2 1.12.2 update the metadata API will sanitize spaces in characters if they are provided in Entity or Attribute names, by replacing with `_` character.

Add an Attribute by using an SQL script

Add an SQL script following the naming conventions to the Scripts\CoreContext folder or to the place configured in the DatabaseBuilder from where it retrieves the scripts. A sample of the script to add a new Attribute can be found below:

-- ATTRIBUTE CONFIGURATION

DECLARE @Id_Entity INT = 5327 -- Id of the Entity to which the Attribute belongs 
DECLARE @Id_Provider INT = 1985
DECLARE @Id_Dsu INT = 1
DECLARE @SecurityPath_Attribute VARCHAR(100) = CONCAT(@Id_Dsu, '/', @Id_Provider, '/', @Id_Entity)
DECLARE @Order_Attribute SMALLINT =
(
    SELECT COALESCE(MAX([Order]) +1, 1)
    FROM [DataIngestion].[Attribute]
    WHERE [IdEntity] = @Id_Entity   
)


-- ATTRIBUTE CREATION

INSERT INTO [DataIngestion].[Attribute] 
    ([IdEntity],[Name],HiveType,MaxLen,IsNullable,NeedTrim,RemoveQuotes,ReplacedText,ReplacementText,
    SpecialFormat,TreatEmptyAsNull,IsPrimaryKey,[Order],IsCalculated,IsPartitionColumn,IsMetadata,SQLType,
    ValidationText,[Description],SourceName,ParentSecurityPath,IsEncrypted,DataMask,ItemId)
VALUES
    (@Id_Entity,'MyNewAttributeViaScript','INT',NULL,1,0,0,NULL,NULL,
    NULL,0,1, @Order_Attribute,0,0,0,'INT',
    NULL,NULL,NULL,@SecurityPath_Attribute,0,NULL,NEWID()),
    (@Id_Entity,'MyNewAttributeViaScript','STRING',NULL,1,0,0,NULL,NULL,
    NULL,0,1, @Order_Attribute+1,0,0,0,'DATETIME2',
    NULL,NULL,NULL,@SecurityPath_Attribute,0,NULL,NEWID()),
    (@Id_Entity,'MyNewAttributeViaScript','BOOLEAN',NULL,1,0,0,NULL,NULL,
    NULL,0,1, @Order_Attribute+2,0,0,0,'BIT',
    NULL,NULL,NULL,@SecurityPath_Attribute,0,NULL,NEWID())

Adding Sidra system Attributes

In the most common scenarios, the script above can be used to add the Sidra system Attributes. It is just required to setup the @Id_Entity to the appropriate value.

Add an Attribute by using the Sidra API

Sidra API requires requests to be authenticated, the section How to use Sidra API explains how to create an authenticated requests. For the rest of the document, it is going to be supposed that Sidra API is deployed in the following URL:

https://core-mycompany-dev-wst-api.azurewebsites.net

Before creating an Attribute, it is required to know the Id of the Entity to which the Attribute belongs.

Information

Take into account that Sidra metadata tables do not allow for space characters in the names of Attributes or Entities. From version 2022.R2 1.12.2 update the metadata API will sanitize spaces in characters if they are provided in Entity or Attribute names, by replacing with `_` character.

Create a new Attribute

By using the Sidra API some of the fields of the Attribute are automatically populated -like Id or ParentSecurityPath-. Once the rest of the information of the Attribute is gathered, it can be created using the following request:

Request

POST https://core-mycompany-dev-wst-api.azurewebsites.net/api/metadata/attributes?api-version=1.0

And adding the following content as part of the body:

Request body

{
    "idEntity": 1200,
    "name": "MyNewAttributeviaAPI",
    "description": "Description",
    "hiveType": "INT",
    "sqlType": "INT",
    "isMetadata": false,
    "isPartitionColumn": false,
    "isCalculated": false,
    "order": 1,
    "isPrimaryKey": false,
    "treatEmptyAsNull": false,
    "removeQuotes": false,
    "needTrim": false,
    "isNullable": true,
    "isEncrypted":false,
    "dataMask":null,
    "sourceType":"INT",
    "replacedText":null,
    "replacementText":null,
    "specialFormat":null,
    "maxLen":100
}

The response will return the Id of the Attribute created, which can be used, for example, for adding a new Attribute format.

A different request must be done for each of the Attributes that want to be created, since, currently, Sidra API does not support creating Attributes in bulk.

Response

{
  "id": 1500,
  "attributeFormats": null,
  "tags": null,
  "sqlType": "INT",
  "isMetadata": false,
  "isPartitionColumn": false,
  "isCalculated": false,
  "order": 1,
  "isPrimaryKey": false,
  "treatEmptyAsNull": false,
  "specialFormat": null,
  "replacementText": null,
  "replacedText": null,
  "removeQuotes": false,
  "needTrim": false,
  "isNullable": true,
  "maxLen": 100,
  "hiveType": "INT",
  "name": "MyNewAttributeviaAPI",
  "idEntity": 1200,
  "validationText": null,
  "description": "Description",
  "sourceName": "MyNewAttributeviaAPI",
  "isEncrypted": false,
  "itemId": "191b09bb-9127-40ed-8549-19be9ccc848e",
  "idProvider": 10,
  "idDataStorageUnit": 1,
  "dataMask": null,
  "parentSecurityPath": "1/10/1200",
  "securityPath": "1/10/1200",
  "sourceType": "INT"
}

Adding Sidra system Attributes

In order to add the system Attributes, the same request can be used but using the following content in the body:

Note

The fields idEntity and order must be properly configured.

Request body

{
    "idEntity": 1200,
    "name": "MyNewAttributeviaAPI2",
    "description": "Description", 
    "hiveType": "INT",
    "sqlType": "INT",
    "isMetadata": true,
    "isPartitionColumn": false,
    "isCalculated": false,
    "order": 2,
    "isPrimaryKey": false,
    "treatEmptyAsNull": false,
    "removeQuotes": false,
    "needTrim": false,
    "isNullable": false,
    "isEncrypted":false,
    "dataMask":null,
    "sourceType":"INT",
    "replacedText":null,
    "replacementText":null,
    "specialFormat":"BLOCK__OFFSET__INSIDE__FILE",
    "maxLen":100
}
{
    "idEntity": 1200,
    "name": "MyNewAttributeviaAPI3",
    "description": "Description", 
    "hiveType": "STRING",
    "sqlType": "DATETIME2",
    "isMetadata": true,
    "isPartitionColumn": false,
    "isCalculated": true,
    "order": 3,
    "isPrimaryKey": false,
    "treatEmptyAsNull": false,
    "removeQuotes": false,
    "needTrim": false,
    "isNullable": false,
    "isEncrypted":false,
    "dataMask":null,
    "sourceType":"INT",
    "replacedText":null,
    "replacementText":null,
    "specialFormat":"FROM_UNIXTIME(UNIX_TIMESTAMP())",
    "maxLen":100
}
{
    "idEntity": 1200,
    "name": "MyNewAttributeviaAPI4",
    "description": "Description", 
    "hiveType": "BOOLEAN",
    "sqlType": "BIT",
    "isMetadata": true,
    "isPartitionColumn": false,
    "isCalculated": true,
    "order": 4,
    "isPrimaryKey": false,
    "treatEmptyAsNull": false,
    "removeQuotes": false,
    "needTrim": false,
    "isNullable": false,
    "isEncrypted":false,
    "dataMask":null,
    "sourceType":"INT",
    "replacedText":null,
    "replacementText":null,
    "specialFormat":"FALSE",
    "maxLen":100
}

General Sidra Attributes information

This is the information about an Attribute in Sidra, which must be included when added to the metadata database:

Column Description
Id [Required] Attribute identifier
IdEntity [Required] The identifier of the related Entity
Name [Required] Name of the column. Space characters are not allowed in Attribute Name
HiveType [Required] The type of the column. It will be used to generate the Hive table. Example values: "INT", "STRING", "BOOLEAN", "DATE".
MaxLen [Optional] Maximum size of the column. This applies to columns with STRING type.
IsNullable [Required] Tells if the column can have null values or not.
NeedTrim [Required] Tells if the colum needs to be trimmed.
RemoveQuotes [Required] Tells if it is necessary to remove quotes from the column.
ReplacedText [Optional] Indicates a text in the column to be replaced.
ReplacementText [Optional] The text to replace the ReplacedText with.
SpecialFormat [Optional] If the column needs to pass a validation that cannot be enforced using the rest of the columns, it is specified here. For example, to validate a date's format. Example values: "BLOCK__OFFSET__INSIDE__FILE", "FROM_UNIXTIME(UNIX_TIMESTAMP())", "FALSE", "'${hiveconf:Date}'", "IdSourceItem".
TreatEmptyAsNull [Required] Tells if empty values should be considered nulls.
IsPrimaryKey [Required] Used when the field IsMerged in Entity is true to determine what data is new.
Order [Required] The order of the column inside the file.
IsCalculated [Required] Tells if the column is calculated automatically, and not a column originally present in the file.
IsPartitionColumn [Required] Tells if the column is used in the table partition or not.
IsMetadata [Required] Tells if the column is metadata for the system, and not data originally present in the file itself.
SQLType [Optional] Indicates the type of the column to create an SQL table. Example values: "INT", "DATETIME2", "BIT", "DATE".
ValidationText [Optional] Defines specific validations to be performed for this data instead of using the validations generated by the GenerateTransferQuery custom activity.
Description [Optional] Description of this Attribute.
SourceName [Optional] Name of the column in the data source. It is used in case it is different to the name used in Sidra.
ParentSecurityPath [Optional] The security path of the parent following the metadata model hierarchy.
IsEncrypted [Optional] Tells if the column is encrypted.
SecurityPath [Computed] The security path of the Attribute.
IdDataStorageUnit [Required] Id of the DSU.
IdProvider [Required] Id of the Provider.
DataMask [Optional] This field indicates that Data masking rules will be applied in the preview of this field in the Data Catalog.
ItemId [Required] The GUID of the Attribute.

Considerations:

  • SQLType: Many Client Applications end up storing data extracted from the Data Lake in an Azure SQL Database. In order to auto-generate the tables to hold that data, this column contains specific SQL Server types such as decimal with scale and precision, or datetime2.
  • IsPartitionColumn: As it has been described in previous sections, data is stored in Data Lake in a optimized way. One of these optimization steps is by partitioning, so extraction can be more accurate and faster. Setting this column to true will make the table created in the Data Lake to be partitioned by this column. It is important to point that, if there are 2 columns acting as partition column, the order will impact on how the table is partitioned. That is, in the example below partitions will be: FileDate then IdSourceItem. It is important to make sure that columns are in the correct order.
  • IsPrimaryKey: Only used for "merged" entities. For plain text files, it is quite often set to false for all the Attributes.
  • When a value for ValidationText is provided, any other validation is overriden and the only one applied is the ValidationText.
  • IsEncrypted can be used to encrypt the values for this Attribute in the Data Lake.

Sidra system Attributes

There are five Attributes that are added to the original set of specific Attributes for the Entity. They are used by Sidra for multiple purposes and it is mandatory to have them. Those Attributes are: SourceByteOffset, LoadDate , HasErrors, FileDate and IdSourceItem.

By default, Sidra partitions the Data Lake tables by FileDate and IdSourceItem, but it is up to the team configuring the system to choose which partitions to use. Sidra team strongly encourages to use these two Attributes or, at least, IdSourceItem which is widely used by the platform and has a good effect on its performance.


Sidra Ideas Portal


Last update: 2022-06-22
Back to top