How to add a new Attribute

An Attribute in Sidra represents one of the fields or columns of an Entity.

So, after the creation of a new Entity, the next step is to create new Attributes for that Entity, one for each of the fields/columns of the Asset that is represented by the Entity.

If the Entity is not created yet, this tutorial How to add a new entity can be used to create it.

Attributes provide details about the structure of the Asset and they will be used, along with Provider and Entity, to generate pipelines, transfer queries, data lake table creations, etc.

Attributes are stored in the Attribute table in the Sidra Core metadata database.

As it happens with other members of the metadata model, there are two ways to create a new Attribute:

  • A. Add an Attribute by creating an SQL script to insert the new Attribute in the core metadata database.
  • B. Add an Attribute by using the Sidra API endpoint to add new Attributes.

General Sidra Attributes information

This is the information about an Attribute in Sidra, which must be included when added to the metadata database:

Column Description
Id [Required] Attribute identifier
IdEntity [Required] The identifier of the related Entity
Name [Required] Name of the column.
HiveType [Required] The type of the column. It will be used to generate the Hive table.
MaxLen [Optional] Maximum size of the column. This applies to columns with STRING type.
IsNullable [Required] Tells if the column can have null values or not.
NeedTrim [Required] Tells if the colum needs to be trimmed.
RemoveQuotes [Required] Tells if it is necessary to remove quotes from the column.
ReplacedText [Optional] Indicates a text in the column to be replaced.
ReplacementText [Optional] The text to replace the ReplacedText with.
SpecialFormat [Optional] If the column needs to pass a validation that cannot be enforced using the rest of the columns, it is specified here. For example, to validate a date's format.
TreatEmptyAsNull [Required] Tells if empty values should be considered nulls.
IsPrimaryKey [Required] Used when the field IsMerged in Entity is true to determine what data is new.
Order [Required] The order of the column inside the file.
IsCalculated [Required] Tells if the column is calculated automatically, and not a column originally present in the file.
IsPartitionColumn [Required] Tells if the column is used in the table partition or not.
IsMetadata [Required] Tells if the column is metadata for the system, and not data originally present in the file itself.
SQLType [Optional] Indicates the type of the column to create an SQL table.
ValidationText [Optional] Defines specific validations to be performed for this data instead of using the validations generated by the GenerateTransferQuery custom activity.
Description [Optional] Description of this Attribute.
SourceName [Optional] Name of the column in the data source. It is used in case it is different to the name used in Sidra.
ParentSecurityPath [Optional] The security path of the parent following the metadata model hierarchy.
IsEncrypted [Required] Tells if the column is encrypted.
SecurityPath [Computed] The security path of the Attribute.
IdDataStorageUnit [Required] Id of the DSU.
IdProvider [Required] Id of the Provider.
DataMask [Optional] This field indicates that Data masking rules will be applied in the preview of this field in the Data Catalog.
ItemId [Required] The GUID of the Attribute.

Considerations:

  • SQLType: Many Client Applications end up storing data extracted from the Data Lake in an Azure SQL Database. In order to auto-generate the tables to hold that data, this column contains specific SQL Server types such as decimal with scale and precision, or datetime2.
  • IsPartitionColumn: As it has been described in previous sections, data is stored in Data Lake in a optimized way. One of these optimization steps is by partitioning, so extraction can be more accurate and faster. Setting this column to true will make the table created in the Data Lake to be partitioned by this column. It is important to point that, if there are 2 columns acting as partition column, the order will impact on how the table is partitioned. That is, in the example below partitions will be: FileDate then IdSourceItem. It is important to make sure that columns are in the correct order.
  • IsPrimaryKey: Only used for "merged" entities. For plain text files, it is quite often set to false for all the Attributes.
  • When a value for ValidationText is provided, any other validation is overriden and the only one applied is the ValidationText.
  • IsEncrypted can be used to encrypt the values for this Attribute in the Data Lake.

Sidra system Attributes

There are five Attributes that are added to the original set of specific Attributes for the Entity. They are used by Sidra for multiple purposes and it is mandatory to have them. Those Attributes are: SourceByteOffset, LoadDate , HasErrors, FileDate and IdSourceItem.

By default, Sidra partitions the Data Lake tables by FileDate and IdSourceItem, but it is up to the team configuring the system to choose which partitions to use. Sidra team strongly encourages to use these two Attributes or, at least, IdSourceItem which is widely used by the platform and has a good effect on its performance.

A. Add an Attribute by using an SQL script

Add an SQL script following the naming conventions to the Scripts\CoreContext folder or to the place configured in the DatabaseBuilder from where it retrieves the scripts. A sample of the script to add a new Attribute can be found below:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
-- DECLARE
DECLARE @Id_Entity INT = 100 -- Id of the entity to which the Attribute belongs 

DECLARE @SecurityPath_Entity VARCHAR(100) =
(
    SELECT TOP(1) [SecurityPath]  
    FROM [DataIngestion].[Entity] 
    WHERE [Id] = @Id_Entity
)
DECLARE @Id_Attribute INT = 
(
    SELECT COALESCE(MAX([Id]) +1, 1)
    FROM [DataIngestion].[Attribute]    
)
DECLARE @Order_Attribute SMALLINT =
(
    SELECT COALESCE(MAX([Order]) +1, 1)
    FROM [DataIngestion].[Attribute]
    WHERE [IdEntity] = @Id_Entity   
)

-- ROLLBACK
DELETE FROM [DataIngestion].[Attribute]
WHERE [Id] BETWEEN @Id_Attribute AND @Id_Attribute+1 -- Update the number of attributes to add

-- SCRIPT
SET IDENTITY_INSERT [DataIngestion].[Attribute] ON
INSERT [DataIngestion].[Attribute] 
     ([Id],             [IdEntity], [Name],     [HiveType], [MaxLen], [IsNullable], [NeedTrim], [RemoveQuotes], [ReplacedText], [ReplacementText], [SpecialFormat], [TreatEmptyAsNull], [IsPrimaryKey], [IsCalculated], [Order],             [IsPartitionColumn], [IsMetadata], [ValidationText], [SQLType],   [ParentSecurityPath], [IsEncrypted]) 
VALUES 
     (@Id_Attribute,    @Id_Entity, N'ColumnA', N'INT',     NULL,      1,            0,          0,              NULL,           NULL,              NULL,           0,                  0,              0,              @Order_Attribute,    0,                   0,            NULL,             N'INT',      @SecurityPath_Entity, 0)
    ,(@Id_Attribute+1,  @Id_Entity, N'ColumnB', N'STRING',  1000,      1,            0,          0,              NULL,           NULL,              NULL,           0,                  0,              0,              @Order_Attribute+1,  0,                   0,            NULL,             N'NVARCHAR', @SecurityPath_Entity, 0) 
SET IDENTITY_INSERT [DataIngestion].[Attribute] OFF

Adding Sidra system Attributes

In the most common scenarios, the script below can be used to add the Sidra system Attributes. It is just required to setup the @Id_Entity to the appropriate value.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
-- DECLARE
DECLARE @Id_Entity INT = 100 -- Id of the entity to which the Attribute belongs 

DECLARE @SecurityPath_Entity VARCHAR(100) =
(
    SELECT TOP(1) [SecurityPath]  
    FROM [DataIngestion].[Entity] 
    WHERE [Id] = @Id_Entity
)
DECLARE @Id_Attribute INT = 
(
    SELECT COALESCE(MAX([Id]) +1, 1)
    FROM [DataIngestion].[Attribute]    
)
DECLARE @Order_Attribute SMALLINT =
(
    SELECT COALESCE(MAX([Order]) +1, 1)
    FROM [DataIngestion].[Attribute]
    WHERE [IdEntity] = @Id_Entity   
)

-- ROLLBACK
DELETE FROM [DataIngestion].[Attribute]
WHERE [Id] BETWEEN @Id_Attribute AND @Id_Attribute+4

-- SCRIPT
SET IDENTITY_INSERT [DataIngestion].[Attribute] ON
INSERT [DataIngestion].[Attribute] 
     ([Id],             [IdEntity], [Name],              [HiveType], [MaxLen], [IsNullable], [NeedTrim], [RemoveQuotes], [ReplacedText], [ReplacementText], [SpecialFormat],                    [TreatEmptyAsNull], [IsPrimaryKey], [IsCalculated], [Order],            [IsPartitionColumn], [IsMetadata], [ValidationText], [SQLType],        [ParentSecurityPath], [IsEncrypted]) 
VALUES 
     (@Id_Attribute,    @Id_Entity, N'SourceByteOffset', N'INT',     NULL,     0,            0,          0,              NULL,           NULL,              N'BLOCK__OFFSET__INSIDE__FILE',     0,                  0,              1,              @Order_Attribute,   0,                    1,            NULL,             N'INT',          @SecurityPath_Entity, 0)
    ,(@Id_Attribute+1,  @Id_Entity, N'LoadDate',         N'STRING',  NULL,     0,            0,          0,              NULL,           NULL,              N'FROM_UNIXTIME(UNIX_TIMESTAMP())', 0,                  0,              1,              @Order_Attribute+1, 0,                    1,            NULL,             N'DATETIME2',    @SecurityPath_Entity, 0)
    ,(@Id_Attribute+2,  @Id_Entity, N'HasErrors',        N'BOOLEAN', NULL,     0,            0,          0,              NULL,           NULL,              N'FALSE',                           0,                  0,              1,              @Order_Attribute+2, 0,                    1,            NULL,             N'BIT',          @SecurityPath_Entity, 0)
    ,(@Id_Attribute+3,  @Id_Entity, N'FileDate',         N'DATE',    NULL,     0,            0,          0,              NULL,           NULL,              N'''${hiveconf:Date}''',            0,                  0,              1,              @Order_Attribute+3, 1,                    1,            NULL,             N'DATE',         @SecurityPath_Entity, 0)
    ,(@Id_Attribute+4,  @Id_Entity, N'IdSourceItem',     N'INT',     NULL,     0,            0,          0,              NULL,           NULL,              N'IdSourceItem',                    0,                  0,              1,              @Order_Attribute+4, 1,                    1,            NULL,             N'INT',          @SecurityPath_Entity, 0)
SET IDENTITY_INSERT [DataIngestion].[Attribute] OFF

Add an Attribute by using the Sidra API

Sidra API requires requests to be authenticated, the section How to use Sidra API explains how to create an authenticated requests. For the rest of the document, it is going to be supposed that Sidra API is deployed in the following URL:

1
https://core-mycompany-dev-wst-api.azurewebsites.net

Before creating an Attribute, it is required to know the Id of the Entity to which the Attribute belongs.

Create a new attribute

By using the Sidra API some of the fields of the Attribute are automatically populated -like Id or ParentSecurityPath-. Once the rest of the information of the Attribute is gathered, it can be created using the following request:

Request

1
POST https://core-mycompany-dev-wst-api.azurewebsites.net/api/metadata/attributes?api-version=1.0

And adding the following content as part of the body:

Request body

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{
    "sqlType": "INT",
    "isMetadata": false,
    "isPartitionColumn": false,
    "isCalculated": false,
    "order": 1,
    "isPrimaryKey": false,
    "treatEmptyAsNull": false,
    "removeQuotes": false,
    "needTrim": false,
    "isNullable": true,
    "hiveType": "INT",
    "name": "ColumnA",
    "idEntity": 100
}

The response will return the Id of the Attribute created, which can be used, for example, for adding a new attribute format.

A different request must be done for each of the Attributes that want to be created, since, currently, Sidra API does not support creating Attributes in bulk.

Response

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
{
  "id": 1000,
  "sqlType": "string",
  "isMetadata": true,
  "isPartitionColumn": true,
  "isCalculated": true,
  "order": 0,
  "isPrimaryKey": true,
  "treatEmptyAsNull": true,
  "specialFormat": "string",
  "replacementText": "string",
  "replacedText": "string",
  "removeQuotes": true,
  "needTrim": true,
  "isNullable": true,
  "maxLen": 0,
  "hiveType": "string",
  "name": "string",
  "idEntity": 0,
  "validationText": "string",
  "description": "string",
  "sourceName": "string"
}

Adding Sidra system Attributes

In order to add the system Attributes, the same request can be used but using the following content in the body:

NOTE

The fields idEntity and order must be properly configured

Request body

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
    "sqlType": "INT",
    "isMetadata": true,
    "isPartitionColumn": false,
    "isCalculated": true,
    "order": 2,
    "isPrimaryKey": false,
    "treatEmptyAsNull": false,
    "removeQuotes": false,
    "needTrim": false,
    "isNullable": false,
    "hiveType": "INT",
    "name": "SourceByteOffset",
    "specialFormat": "BLOCK__OFFSET__INSIDE__FILE",
    "idEntity": 100
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
    "sqlType": "DATETIME2",
    "isMetadata": true,
    "isPartitionColumn": false,
    "isCalculated": true,
    "order": 3,
    "isPrimaryKey": false,
    "treatEmptyAsNull": false,
    "removeQuotes": false,
    "needTrim": false,
    "isNullable": false,
    "hiveType": "STRING",
    "name": "LoadDate",
    "specialFormat": "FROM_UNIXTIME(UNIX_TIMESTAMP())",
    "idEntity": 100
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
    "sqlType": "BIT",
    "isMetadata": true,
    "isPartitionColumn": false,
    "isCalculated": true,
    "order": 4,
    "isPrimaryKey": false,
    "treatEmptyAsNull": false,
    "removeQuotes": false,
    "needTrim": false,
    "isNullable": false,
    "hiveType": "BOOLEAN",
    "name": "HasErrors",
    "specialFormat": "FALSE",
    "idEntity": 100
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
  "sqlType": "DATE",
  "isMetadata": true,
  "isPartitionColumn": true,
  "isCalculated": true,
  "order": 5,
  "isPrimaryKey": false,
  "treatEmptyAsNull": false,
  "removeQuotes": false,
  "needTrim": false,
  "isNullable": false,
  "hiveType": "DATE",
  "name": "FileDate",
  "specialFormat": "'${hiveconf:Date}'",
  "idEntity": 100
}
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
{
  "sqlType": "INT",
  "isMetadata": true,
  "isPartitionColumn": true,
  "isCalculated": true,
  "order": 6,
  "isPrimaryKey": false,
  "treatEmptyAsNull": false,
  "removeQuotes": false,
  "needTrim": false,
  "isNullable": false,
  "hiveType": "INT",
  "name": "IdSourceItem",
  "specialFormat": "IdSourceItem",
  "idEntity": 100
}