Destination paths and scope of ingestion automation¶
Sidra has different landing folders defined for different types of ingestion.
For example, the folder indexlanding
is used for Knowledge Store ingestion, and the folder landing
is used for general semi-structured file ingestion.
Current Sharepoint plugin version incorporates two possible inputs for specifying the destination path:
Destination path specification
- The plugin will automatically configure a whole end-to-end data extraction and file ingestion process into the data lake, once the right pipeline has been selected in the field "Please select an Azure Search pipeline for indexing".
- For doing this, the underlying Entities and Attributes creation (metadata generation) will be handled transparently by the plugin.
- The Entities will be automatically created as per the fields configured in the metadata extraction step.
- The Attributes will be automatically created, as they can be defined in a standardized way for every Asset of type binary data (unstructured data).
-
The plugin will automatically associate the created Entities to a generic Azure Search pipeline. This option will therefore only apply to binary file ingestion.
For details on Knowledge Store ingestion see the Sidra documentation .
-
For the data extraction process, the trigger configuration step will be used to determine the periodicity of extracting data from the source.
- Whenever files are deposited to the Landing Zone for Azure Search, there will be an internal trigger in Sidra to launch transparently the binary file ingestion process.
- The user will be prompted to select from a list of available storage containers in Sidra, where the extracted documents from the source will be deposited.
- In this case, there won't be any file ingestion process, and the plugin will just cover for the pure data extraction part.
- This applies to semi-structured files.
-
In order to configure a full end-to-end data ingestion process in the data lake, the user will need to execute manually the processes of metadata generation (creation of Attributes) and the association of the Entities to the data ingestion pipeline (
FileIngestionDatabricks
).For making this possible, if the
other existing container
option is selected, the user will be asked to enter an additional parameter calledFormat
for each configured Entity. This will correspond to theFormat
field in the Entity metadata model, which is needed for a further data intake processing in Sidra.For more details on how the general file ingestion the
landing
root folder works, see this documentation . -
Thanks to this plugin, the Data Intake Process is configured in less than five minutes. Once the settings are configured and the deployment process is started, the actual duration of the data ingestion may vary from few minutes to few hours, depending on the data volumes.
- After starting the Data Intake Process creation, users will receive a message that the process has started and will continue in the background. Users will be able to navigate through Sidra Web as usual while this process happens.
- Once the whole deployment process is finished, users will receive a notification in Sidra Web Notifications widget. If this process went successfully, the new data structures (new Entity) will appear in the Data Catalog automatically, and the Data Intake Process will incorporate this new data source.