Data Intake types¶
As it follows, there can be several types of Data Intake including the Data Intake Processes via connector plugins.
1. Data Intake via landing zone¶
In certain types of data flows, such as for the generic batch file intake process, the landing zone is used as the starting point for file ingestion.
Some key features about a Data Intake from the landing zone would be:
Sidra incorporates already deployed out-of-the-box pipelines for the file ingestion from the Landing Zone. In this case, it is not necessary to perform an explicit deployment or manual execution of the pipelines.
This type of Data Intake is usually selected for certain types of data sources in semi-structured format among others. Thus, this DIP type can encompass the following scenarios:
- When the data can be deposited through some external data extraction process in e.g., .parquet or .csv format.
- There can be also a data ingestion from Excel files, which is a specialized sub-type of data ingestion from landing, that requires additional scripts for metadata configuration and a specialized DSU ingestion script.
- When there is a separate data extraction pipeline developed that actually extracts semi-structured data files (e.g., JSON) from services or APIs.
Asset flow into Sidra's platform via landing zone
For a detailed explanation of a complete Data Intake using the landing zone, please continue in this overview page .
2. Data Intake with document indexing¶
Sidra incorporates a separate process for binary file (document) ingestion that is a bit different from the above process.
Although some stages as file registration and file ingestion are also performed in this type of data intake, there are some significant differences as per the indexing processing steps that need to happen for the files.
Azure Search is the key service that will be responsible for applying cognitive skills on the binary files (documents). The process usually starts by depositing the files in a special landing zone container called indexlanding.
How binaries are ingested and the Knowledge Store
3. Data Intake Process via connector plugins¶
Other types of data ingestion flows extract directly the data from the data source (e.g. SQL Database) to the raw format in the Data Storage Unit, without the intermediate step of depositing them to the landing zone.
This can be done thanks to the developed Sidra connector plugins. More information can be seen in upcoming documentation.
About Sidra connector plugins
For a detailed explanation of a complete Data Intake Process using connector plugins, please continue in this overview page .