Custom activities are used in Sidra Data Platform to develop functionalities not covered by default Azure Data Factory (ADF) activities.
ADF provides many built-in activities that can be used to copy data, modifying data, execute processes into cluster, call an API... However there are scenarios that are not supported, in order to solve this situation a custom activity can be used.
A custom activity is an ADF activity that allows to execute custom code. There are two approaches for executing and processing the code:
- using Azure Batch pools
- using Azure functions
Sidra Data Platform uses the Azure Batch account approach.
It is usual to refer to the code executed by the activity as the "custom activity", which can be confusing. In this document is important to understand the difference so it will be referred as:
- Custom activity, the ADF activity that allows executing custom code in ADF.
- Custom application, the custom code executed by the custom activity.
All the custom applications in Sidra are .NET Core class library.
The use of custom activities in Sidra will be deprecated and it is being replaced by other methods to provide custom applications, for example using a new endpoint in Sidra API.
Configure a custom activity¶
In order to execute the custom application, the custom activity needs to know where are stored the binaries and how to invoke them.
Sidra stores the binaries of all the custom applications in a container named
adfcustomactivities in an Azure Storage account. Each of them is packaged in a ZIP file.
Every custom activity will be configured with the command necessary to invoke the specific custom application that the activity uses. Since it is packaged in a ZIP file, those commands will follow this format:
Which uses PowerShell to unzip the custom application binaries and then executes it using the
Custom activity parameters¶
As any other activity, the custom activity will need to receive parameters and have access to datasets and linked services from ADF. It is important to know that any kind of object defined in ADF can be passed to the custom activity, as well as any kind of parameter. So they can receive:
- Datasets of any kind
- Linked Services
- Additional properties (known as 'Extended Properties')
All of this references are declared at the time of defining the JSON structure for the custom activity. The JSON structure looks like the following:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
- Name is the name of the activity.
- Type is the type of activity. For custom activities, this is always
- Policy is the execution policies for this activity, like timeout, number of retries or retry interval.
- TypeProperties are properties related to this specific type of activity (Custom).
- Command: The command that will be executed from the command line interface.
- ResourceLinkedService: Reference to the Azure Storage account where the custom application binaries are located.
- FolderPath: Path inside the Azure Storage account where the custom application binaries are located.
- ExtendedProperties: Additional properties that are going to be passed to the custom application.
- ReferenceObjects: References to Linked Services or Datasets defined in the ADF that are going to be used by the custom application.
- LinkedServiceName: Name of the Batch account linked service where the custom application is going to be executed.
Communication between custom activity and custom application¶
All the information from the parameters that receives the custom activity must be also available to the custom application. When the custom applications are executed in a batch account, ADF generate three files with the contents of all parameters. These are:
- Activity.json: Contains the activity JSON structure like the one defined above, and, therefore, all the extended properties.
- LinkedServices.json: Contains all the JSON structures for all the referenced Linked Services in the custom activity.
- DataSets.json: Contains all the JSON structures for all the referenced datasets in the custom activity.
Needless to say that, if any of the referenced objects does not exist in ADF, the custom activity will not be able to be deployed.
The custom application will have to read all those files in order to get that information.
Output and error¶
The custom activity configures the standard output and standard error I/O channels of the custom application to files stored in a container named
adfjobs in the Azure Storage of the Batch account.
Excerpt from Microsoft documentation
"The stdout and stderr of your custom application are saved to the adfjobs container in the Azure Storage Linked Service you defined when creating Azure Batch Linked Service with a GUID of the task."
More information about custom activities can be found in Microsoft Docs.
As previously commented, all the custom applications in Sidra are .NET Core class library. To accelerate the creation of new custom applications, Sidra provides:
- A Visual Studio template that generates the Visual Studio solution of a custom application.
- A set of base classes and helpers included in the NuGet package PlainConcepts.Sidra.DataFactory.CustomActivities.Common.
The NuGet package provides classes and helpers for:
- Reading the JSON files -
datasets.json- and providing the referenceObjects and the extendedProperties to the custom application.
- Performing validations on the parameters included in the extendedProperties.
- Accessing to the linked services, datasets, etc defined in the referenceObjects.
Custom apps in Sidra¶
Sidra provides a set of predefined custom applications released as NuGet packages. In order to use those customs apps a host project must be created in the Core or Client app solution.
The host project will be a .NET Core Console application that contains a
Program class and a reference to the NuGet package of the custom app. For example, this is the host project for the CreateTableScript custom app:
Program class only contains the call to the entrypoint of custom app which is the
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
This is the list of custom applications provided by Sidra.
|Create table script||It retrieves the information of an entity from the metadata database, generates the create table script and stores it. More information about the create table script can be found in How assets are ingested into the Data Lake|
|Excel to csv||It converts a list of Excel files stored in an Azure Blob folder to CSV files.|
|Extract||It is used by the Client apps. It gets the data from an entity that is already ingested in the Data Lake but it is not loaded yet in the Client app.|
|Extract Bing Ads||It uses Bing Ads API to generate a report and copy it to a landing zone in an Azure Storage account.|
|Extract Google Analytics||It retrieves the information from a Google Analytics view and copies it to a CSV file in an Azure Storage account.|
|Generate transfer query||It retrieves the information of an entity from the metadata database, generates the transfer query script and stores it. More information about the transfer query script can be found in How assets are ingested into the Data Lake|
|Get Data Factory metrics||It retrieves information about Pipelines and Activities from Data Factory and copies it into a dataset.|
|Insert fields||It inserts additional fields to a CSV files, passing a dictionary with the fields that should be added and their values.|
|PowerBI||It executes a command over a PowerBI DataFlow.|