Configuration steps¶
The user requires to input different parameters to configure a Data Intake Process from Google Analytics.
Step 1. Configure Data Intake Process¶
Please see on the common Sidra connector plugins section about the parameters needed to configure a Data Intake Process.
Step 2. Configure Provider¶
Please see on the common Sidra connector plugins section about the parameters needed to create a new Provider.
Step 3. Configure Data Source¶
The data source represents the connection to the Google Analytics Universal reporting API. A Data Source abstracts the details of creating a Linked Service in Azure Data Factory. The fields required in this section are the following:
For configuring the data source for this Data Intake Process, is required to specify a Google Analytics Service Account Key.
- A Service Account Key needs to be generated and input in this plugin wizard for accessing the Google Analytics subscription. The Key should be downloaded as a JSON file. This JSON includes the connection details to your Google Analytics subscription. Please check the Google Analytics Service Account documentation for details on how to obtain this JSON. You can also check more details here.
Sidra connector plugin for Google Analytics Universal will register this new data source in Sidra Metadata and deploy a Linked Service in Azure Data Factory with this connection.
For more details on Linked Services check the Data Factory documentation.
Step 4. Configure Metadata Extractor¶
Sidra connector plugin for Google Analytics Universal creates the needed metadata and orchestrates the data integration. To configure this, it is required to configure the set of Entities, each one mapping to a different set of data (metrics and dimensions table) that needs to be imported in Sidra. The data for these Entities will be finally stored as delta tables in Databricks inside the Data Storage Unit (DSU).
You can configure multiple Entities with this plugin. Each Entity in Sidra will map to a report table in Google Analytics. These report tables can be either pre-built reporting tables that Google Analytics offers, or custom reports.
Each report table is defined by a set of fields in Google Analytics: an Account, a Property, a View and a collection of dimensions and metrics to export.
To configure the set of fields related to a new Entity, please click on the Add Entity button. Please check the Google Analytics documentation for details on how these reports are generated.
For each of the Entity to be loaded in Sidra, you need to specify the data that this Entity will be tracking (see below on Entities mapping).
The creation of a Data Intake Process with this plugin will create the following:
- The necessary metadata and data governance structures are created and populated in Sidra (e.g. Provider, Entities, Attributes, Data Intake Process).
- The actual data integration infrastructure (ADF Pipelines) is created, configured, and deployed. A metadata extraction pipeline and a data extraction pipeline will be created and deployed.
- The metadata extraction pipeline will extract the Entities and Attributes and will create such metadata in core DB.
- The data extraction pipeline will extract the report data given the set of metrics and dimensions to export, and will first write this data into the raw blob container as .parquet file. The second step of this data extraction pipeline will be to execute the DSU script to ingest this .parquet data as delta table in the DSU.
- The needed relationships between the Entity and the data extraction pipeline and between the pipeline and the trigger will be created to ensure the proper orchestration of the end-to-end flow.
For more details on how the general Sidra data ingestion works, see this documentation.
Sidra Google Analytics Universal plugin relies on the Sidra Metadata Model for mapping source data structures to Sidra Entities.
Entities mapping¶
In order to define which report data correspond to which Entity in Sidra, the wizard will include five configuration fields per Entity:
- Entity Name: You can configure multiple Entities. This is the name of the Entity in Sidra metadata system to assign the extracted files. Entity names cannot contain special characters. To configure the set fields related to a new Entity, please add on the "Add Entity" button.
- Account Id: the Id of the Account in GA. For more information on what is a Google Analytics Account, you can go to this page , which describes the organizational hierarchy of objects inside a Google Analytics organization. An Account is your access point for Analytics, so you can specify the Properties that you want to track. An Account can contain one or more Properties.
- Property Id: the Id of the Property in GA. A Property is a website, mobile application, or device.
- View Id: the Id of the View in GA. A View, in Universal Analytics, is your access point for reports, a defined view of data from a property. A Property can contain one or more views.
- Dimensions and metrics JSON: this field needs to contain a standard JSON compatible with GA syntax, with the definition of the dimensions and metrics to export. The format of this JSON needs to adhere to Google Analytics standards. Please check the Google Analytics documentation for details on how to obtain this JSON from a configured report. This JSON will be applicable to both pre-built reports or custom reports. In this provided JSON you can specify the parameters to filter by.
The Data Intake process is configured via this connector in less than five minutes. Once the settings are configured and the deployment process is started, the actual duration of the data ingestion may vary from few minutes to few hours, depending on the data volumes.
After starting the Data Intake Process creation, users will receive a message that the process has started and will continue in the background. Users will be able to navigate through Sidra Web as usual while this process happens.
Once the whole deployment process is finished, users will receive a notification in Sidra Web Notifications widget. If this process went successfully, the new data structures (new Entities) will appear in the Data Catalog automatically, and the Data Intake Process will incorporate this new data source.
Step 5. Configure Trigger¶
Please see on the common Sidra connector plugins page about the parameters needed to set up a trigger. Note as well that, due to the single incremental synchronization process of the last day, the recommended scheduled interval for the plugin needs to be 1 day. Otherwise, if the frequency is lower, some data will not be imported into Sidra.