Skip to content

Asset

An Asset in Sidra Data Platform represents an instance of each of the data elements that get ingested into the platform. An Asset is a term that abstracts many different data formats.

While Entities are the metadata structures inside the Sidra metadata that represent the structure of the tables to be ingested, Assets are the specific instances of data ingested into the platform (data drops). The key components in Sidra Data Platform have been designed to identify, support, manipulate, move and query Assets in the platform.

Examples of Assets in Sidra Data Platform are:

  • An intermediate database extract into a Parquet format file.
  • A PDF file which is part of a larger collection of documents.

Asset flow in Sidra

All the data that will be ingested in the Data Lake must be extracted from the data source and stored in files in the platform. Those files are the "raw storage" of the information ingested in the Data Lake and must be in one of the supported file types, currently CSV and Parquet. The reason of using a specific file types is in order to automatize the ingestion process from the raw storage into the optimized storage of the Data Lake. That means that if the information is stored in any other type of file -such as an XML or JSON- in the data source, it must be previously converted to one of the supported file types.

To transfer the data from the raw storage to the Data Lake, the platform uses Spark SQL queries. The Asset metadata is used -among other purposes- to automatically generate those queries.