Skip to content

About the Google Analytics Universal connector plugin

Versions

From Sidra 2022.R3 version onwards, the autogenerated Transfer Query will be replaced by the DSU ingestion script. More information can be checked here .

The Sidra plugin (a.k.a. connector) for Google Analytics Universal enables seamless connection to a Google Analytics Universal account to extract metrics and dimensions reporting data.

This connector allows to connect to the reporting layer of the Google Analytics Universal API via pre-defined report queries. You can configure multiple Entities mapping to different reports. The Sidra connector wizard form allows to define these different report tables to import. As such, it does not allow to access all the raw data from the Google Analytics model, it just allows to extract pre-configured report tables. Reporting tables are configured by a set of metrics and dimensions. Not all dimensions and metrics are valid combinations to query.

You can check the correctness of these metrics and dimensions combinations and more details about the Google Reporting API in these pages:

Sidra's plugin for creating Data Intake Processes from Google Analytics is responsible for extracting metrics and dimensions reporting data as offered by the Google Analytics service, and loading this data as tables into the specified Data Storage Unit at specified intervals.

This plugin is compatible with both Google Analytics Universal reporting API.

Supported data synchronization mechanisms

The Google Analytics Universal connector supports a single mode of data synchronization. This is an incremental data synchronization based on a fixed window of one day back from the time when the data extraction pipeline is triggered.

This means, that, every time the data extraction pipeline runs, the API call will fetch the data from Google Analytics Universal from the last day. This behaviour makes a daily trigger the recommended approach to avoid losing any data.

Because the Google Analytics reporting data is delivered in a cube, to change this report, it is required that you create a new Data Intake Process for the updated version of the Google Analytics report.

You can see more details about how Sidra Data Intakes and how plugin works and a setup guide in the Sidra Core documentation.

Supported Google Analytics versions

This plugin is compatible with the following Google Analytics services:

  • Google Analytics Universal
  • Google Analytics 360

This version of the plugin is not compatible with GA4 reporting.

Configuration steps

The user requires to input different parameters to configure a Data Intake Process from Google Analytics.

Step 1. Configure Data Intake Process

Please see on the common Sidra connector plugins page about the parameters needed to configure the fields for a Data Intake Process.

Step 2. Configure Provider

Please see on the common Sidra connector plugins page about the parameters needed to create a new Provider.

Step 3. Configure Data Source

The data source represents the connection to the Google Analytics Universal reporting API. A Data Source abstracts the details of creating a Linked Service in Azure Data Factory. The fields required in this section are the following:

For configuring the data source for this Data Intake Process, is required to specify a Google Analytics Service Account Key.

  • A Service Account Key needs to be generated and input in this plugin wizard for accessing the Google Analytics subscription. The Key should be downloaded as a JSON file. This JSON includes the connection details to your Google Analytics subscription. Please check the Google Analytics Service Account documentation for details on how to obtain this JSON. You can also check more details here.

Sidra connector plugin for Google Analytics Universal will register this new data source in Sidra Metadata and deploy a Linked Service in Azure Data Factory with this connection.

For more details on Linked Services check the Data Factory documentation.

Step 4. Configure Metadata Extractor

Sidra connector plugin for Google Analytics Universal creates the needed metadata and orchestrates the data integration. To configure this, it is required to configure the set of Entities, each one mapping to a different set of data (metrics and dimensions table) that needs to be imported in Sidra. The data for these Entities will be finally stored as delta tables in Databricks inside the Data Storage Unit (DSU).

You can configure multiple Entities with this plugin. Each Entity in Sidra will map to a report table in Google Analytics. These report tables can be either pre-built reporting tables that Google Analytics offers, or custom reports.

Each report table is defined by a set of fields in Google Analytics: an Account, a Property, a View and a collection of dimensions and metrics to export.

To configure the set of fields related to a new Entity, please click on the Add Entity button. Please check the Google Analytics documentation for details on how these reports are generated.

For each of the Entity to be loaded in Sidra, you need to specify the data that this Entity will be tracking (see below on Entities mapping).

The creation of a Data Intake Process with this plugin will create the following:

  • The necessary metadata and data governance structures are created and populated in Sidra (e.g. Provider, Entities, Attributes, Data Intake Process).
  • The actual data integration infrastructure (ADF Pipelines) is created, configured, and deployed. A metadata extraction pipeline and a data extraction pipeline will be created and deployed.
  • The metadata extraction pipeline will extract the Entities and Attributes and will create such metadata in core DB.
  • The data extraction pipeline will extract the report data given the set of metrics and dimensions to export, and will first write this data into the raw blob container as .parquet file. The second step of this data extraction pipeline will be to execute the transfer query script to ingest this .parquet data as delta table in the DSU.
  • The needed relationships between the Entity and the data extraction pipeline and between the pipeline and the trigger will be created to ensure the proper orchestration of the end-to-end flow.

For more details on how the general Sidra data ingestion works, see this documentation.

Sidra Google Analytics Universal plugin relies on the Sidra Metadata Model for mapping source data structures to Sidra Entities.

Entities mapping

In order to define which report data correspond to which Entity in Sidra, the wizard will include five configuration fields per Entity:

  • Entity Name: You can configure multiple Entities. This is the name of the Entity in Sidra metadata system to assign the extracted files. Entity names cannot contain special characters. To configure the set fields related to a new Entity, please add on the "Add Entity" button.
  • Account Id: the Id of the Account in GA. For more information on what is a Google Analytics Account, you can go to this page , which describes the organizational hierarchy of objects inside a Google Analytics organization. An Account is your access point for Analytics, so you can specify the Properties that you want to track. An Account can contain one or more Properties.
  • Property Id: the Id of the Property in GA. A Property is a website, mobile application, or device.
  • View Id: the Id of the View in GA. A View, in Universal Analytics, is your access point for reports, a defined view of data from a property. A Property can contain one or more views.
  • Dimensions and metrics JSON: this field needs to contain a standard JSON compatible with GA syntax, with the definition of the dimensions and metrics to export. The format of this JSON needs to adhere to Google Analytics standards. Please check the Google Analytics documentation for details on how to obtain this JSON from a configured report. This JSON will be applicable to both pre-built reports or custom reports. In this provided JSON you can specify the parameters to filter by.

The Data Intake process is configured via this connector in less than five minutes. Once the settings are configured and the deployment process is started, the actual duration of the data ingestion may vary from few minutes to few hours, depending on the data volumes.

After starting the Data Intake Process creation, users will receive a message that the process has started and will continue in the background. Users will be able to navigate through Sidra Web as usual while this process happens.

Once the whole deployment process is finished, users will receive a notification in Sidra Web Notifications widget. If this process went successfully, the new data structures (new Entities) will appear in the Data Catalog automatically, and the Data Intake Process will incorporate this new data source.

Step 5. Configure Trigger

Please see on the common Sidra connector plugins page about the parameters needed to set up a trigger. Note as well that, due to the single incremental synchronization process of the last day, the recommended scheduled interval for the plugin needs to be 1 day. Otherwise, if the frequency is lower, some data will not be imported into Sidra.


Last update: 2022-09-30
Back to top