Sidra API module: Data Querying¶
This page provides an overview of the Query API endpoints available in Sidra, which enable secure, asynchronous access to data stored in a Data Storage Unit (DSU).
To complement the API reference, we’ve also included a step-by-step tutorial demonstrating how to authenticate, submit queries, and retrieve results using these endpoints. While the example uses Postman for simplicity, the same flow can be implemented in any programming language or tool that supports HTTP requests.
Query API Tutorial¶
This tutorial demonstrates how to use the Sidra Query API to securely retrieve data from a Data Storage Unit (DSU) via an asynchronous API flow.
The tutorial covers the three core steps of using the Query API:
- Authenticate using Sidra Core API
- Submit a query to the DSU
- Poll for the results asynchronously
For simplicity, we’ll use Postman throughout the examples, but you’re free to use any other tool or programming language.
1. Authenticating with the Sidra Core API¶
To authenticate against the Sidra API, you’ll need the following:
-
Access Token URL: This is built by appending
/connect/token
to your Identity Server URL. For example:
https://youridentityserver.azurewebsites.net/connect/token
-
Client ID: The Client ID can be assigned to a user or a Data Product. It must have the appropriate permissions on the DSU you want to query.
-
Client Secret: The password associated with the Client ID.
-
Scope: Use
sidra.api
.
In Postman: - Create a new request. - Go to the Authorization tab. - Choose the OAuth 2.0 type and click Get New Access Token. - Fill in the fields as described above.
It should look like this:
Once you’ve successfully retrieved a token, you’re ready to move on to querying.
2. Submitting a Query¶
With authentication in place, you can now make a request to the Query API to extract data from the DSU. Here's how it works:
- The API processes your request and triggers a Databricks job.
- If the request is valid, you’ll receive a 202 Accepted status and a
Location
header with a polling URL. - You must supply a SAS token for an Azure Blob Storage location where the results will be written.
The API supports different output formats, including CSV and Parquet. In this tutorial, we’ll use the Parquet endpoint.
Refer to your Swagger interface (e.g., https://yoursidracoreapi.azurewebsites.net/swagger/index.html
) for full endpoint documentation.
Parquet Endpoint Parameters¶
The Parquet endpoint is:
/api/Query/entity/{idEntity}/parquet
It accepts several parameters as shown below:
Example of a filled request:
Important notes:
- The
storageToken
must be URL-encoded and must grant write permissions to the Blob container. - Use
SidraIdAsset
to specify one or more asset IDs to extract (comma-separated). - The Client ID used must have permissions on the DSU/Provider/Entity involved.
- A 202 Accepted response indicates that the request was queued successfully.
Copy the Location
URL from the response headers. You’ll use it in the next step to poll the status.
3. Polling for the Result¶
Sidra provides a polling endpoint to check when your data extraction is complete. The Location
URL returned in step 2 points to this endpoint.
Possible outcomes include:
- 200 OK: Data is ready and has been successfully written to the Blob Storage.
- 202 Accepted: The job is still running. Keep polling until you receive a 200 OK.
- 500 Error: An error occurred. Common causes include invalid attributes, missing permissions, or invalid asset IDs.
When the extraction completes (200 OK), the output file will be available at your specified Blob Storage location:
Summary¶
The Sidra Query API enables secure, asynchronous extraction of data from DSUs. It respects Sidra’s fine-grained access control and integrates with your existing Azure Blob Storage for output delivery.
This tutorial has used Postman for simplicity, but the same logic can be applied in any programming language or automation script.
Versions¶
Supported Versions
From version 2020.R2 to 2022.R3, a Python library named pysidra
is available, offering wrappers for Sidra’s Core API, including querying and polling endpoints.
From 2022.R3 onward, the official Python client library is SidraCoreApiPythonClient
.