Skip to content

Inference

The metadata inference is a process by which the metadata of an Asset is inferred based on some information about the Asset, depending on the type of information provided it will be used one endpoint or other. The information can be:

1. The result of a SQL query

The API endpoint used for that is /api/inference/dbinference. It returns the list of Entities and Attributes inferred. The endpoint requires the following parameters:

Parameter Required Description
query Optional SQL query used as an example to obtain the metadata. If not provided, the entire database is inferred.
connectionString Mandatory Connection string to access the database where the SQL query will be executed.
idProvider Mandatory Identifier of the Provider in the metadata database to which all the Entities will be associated.
store Optional Flag that indicates if the results of the inference -the Entities and Attributes- must be stored in the metadata database or only returned. The default value is False.
fileNameStyle Optional Reference to the format of the names of the files that the Assets will have. See the Naming convention section below. The default value is 1.
querySource Optional Reference to the supported types of SQL databases. See the Database sources section below. The default value is 0.

2. A list of data types

The API endpoint used is /api/inference/dbextractinference. It returns the list of Entities and Attributes inferred. The endpoint requires the following parameters:

Parameter Required Description
dbTypes Mandatory Array with the metadata information extracted from the data source database. See the Database Types section below.
idProvider Mandatory Identifier of the Provider in the metadata database to which all the Entities will be associated.
store Optional Flag that indicates if the results of the inference -the Entities and Attributes- must be stored in the metadata database or only returned. The default value is False.
fileNameStyle Optional Reference to the format of the names of the files that the Assets will have. See the Naming convention section below. The default value is 1.
querySource Optional Reference to the supported types of SQL databases. See the Database sources section below. The default value is 0.

Naming convention

The Entities have a field named RegEx that contains a regular expression that identifies what Assets will be associated to the Entity. When inferring an Entity from a table, the RegEx field is populated with the following regular expression composed by three parts:

^{prefix}{tableName}{suffix}.parquet
  1. prefix will be the name of the schema of the table in lower case followed by underscore. If it is a Transact SQL and the schema is the default dbo, the prefix will be omitted.
  2. tableName will be the name of the table in lower case.
  3. suffix will depend on the fileNameStyle selected. The options can be seen in the following table:
Id FileNameSuffixStyle Regex value sample Asset filename sample
0 NoSuffix ^myschema_mytable.parquet myschema_mytable.parquet
1 Date ^myschema_mytable_((?<year>\d{4})(?<month>\d{2})(?<day>\d{2})).parquet myschema_mytable_20190129.parquet
2 DateTime ^myschema_mytable__((?<year>\d{4})(?<month>\d{2})(?<day>\d{2}))-((?<hour>\d{2})(?<minute>\d{2})(?<second>\d{2})).parquet myschema_mytable_20190129-105555.parquet

Database sources

The QuerySource identifies the database source of the SQL query:

Id QuerySources
0 TSQL
1 DB2

Database Types

The dbTypes is an array with the information of the metadata of an Entity in a format that can be easily populated with information extracted from the data source database. The JSON structure is the following:

{
    "TABLE_SCHEMA": "string",
    "TABLE_NAME": "string",
    "COLUMN_NAME": "string",
    "DATA_TYPE": "string",
    "NUMERIC_PRECISION": 0,
    "NUMERIC_SCALE": 0,
    "CHARACTER_MAXIMUM_LENGTH": 0,
    "DATETIME_PRECISION": 0,
    "ORDINAL_POSITION": 0,
    "IS_NULLABLE": true,
    "IS_PK": true
}