GraphQL Data Connector
The GraphQL Data Connector enables federated SQL queries on any GraphQL endpoint by specifying graphql
as the selector in the from
value for the dataset.
datasets:
- from: graphql:your-graphql-endpoint
name: my_dataset
params:
json_pointer: /data/some/nodes
graphql_query: |
{
some {
nodes {
field1
field2
}
}
}
- The GraphQL data connector does not support variables in the query.
- Filter pushdown is not currently supported; however, when using the limit, the connector will request only the necessary data.
Configuration​
The GraphQL data connector can be configured by providing the following params
. Use the secret replacement syntax to load the password from a secret store, e.g. ${secrets:my_graphql_auth_token}
.
unnest_depth
: Depth level to automatically unnest objects to. By default, disabled if unspecified or0
.graphql_auth_token
: The authentication token to use to connect to the GraphQL server. Uses bearer authentication.graphql_auth_user
: The username to use for basic auth. E.g.graphql_auth_user: my_user
graphql_auth_pass
: The password to use for basic auth. E.g.graphql_auth_pass: ${secrets:my_graphql_auth_pass}
graphql_query
: The GraphQL query to execute. E.g.
query: |
{
some {
nodes {
field1
field2
}
}
}
json_pointer
: The JSON pointer into the response body. Whengraphql_query
is paginated, thejson_pointer
can be inferred.
Examples​
Example using the GitHub GraphQL API and Bearer Auth. The following will use json_pointer
to retrieve all of the nodes in starredRepositories:
from: graphql:https://api.github.com/graphql
name: stars
params:
graphql_auth_token: ${env:GITHUB_TOKEN}
graphql_auth_user: ${env:GRAPHQL_USER} ...
graphql_auth_pass: ${env:GRAPHQL_PASS}
json_pointer: /data/viewer/starredRepositories/nodes
graphql_query: |
{
viewer {
starredRepositories {
nodes {
name
stargazerCount
languages (first: 10) {
nodes {
name
}
}
}
}
}
}
Pagination​
The GraphQL Data Connector supports automatic pagination of the response for queries using cursor pagination.
The graphql_query
must include the pageInfo
field as per spec. The connector will parse the graphql_query
, and when pageInfo
is present, will retrieve data until pagination completes.
The query must have the correct pagination arguments in the associated paginated field.
Example​
Forward Pagination:
{
something_paginated(first: 100) {
nodes {
foo
bar
}
pageInfo {
endCursor
hasNextPage
}
}
}
Backward Pagination:
{
something_paginated(last: 100) {
nodes {
foo
bar
}
pageInfo {
startCursor
hasPreviousPage
}
}
}
Working with JSON Data​
Tips for working with JSON data. For more information see Datafusion Docs.
Accessing objects fields​
You can access the fields of the object using the square bracket notation. Arrays are indexed from 1.
Example for the stargazers query from pagination section:
sql> select node['login'] as login, node['name'] as name from stargazers limit 5;
+--------------+----------------------+
| login | name |
+--------------+----------------------+
| simsieg | Simon Siegert |
| davidmathers | David Mathers |
| ahmedtadde | Ahmed Tadde |
| lordhamlet | Shih-Fen Cheng |
| thinmy | Thinmy Patrick Alves |
+--------------+----------------------+
Piping array into rows​
You can use Datafusion unnest
function to pipe values from array into rows.
We'll be using countries GraphQL api as an example.
from: graphql:https://countries.trevorblades.com
name: countries
params:
json_pointer: /data/continents
graphql_query: |
{
continents {
name
countries {
name
capital
}
}
}
description: countries
acceleration:
enabled: true
refresh_mode: full
refresh_check_interval: 30m
Example query:
sql> select continent, country['name'] as country, country['capital'] as capital
from (select name as continent, unnest(countries) as country from countries)
where continent = 'North America' limit 5;
+---------------+---------------------+--------------+
| continent | country | capital |
+---------------+---------------------+--------------+
| North America | Antigua and Barbuda | Saint John's |
| North America | Anguilla | The Valley |
| North America | Aruba | Oranjestad |
| North America | Barbados | Bridgetown |
| North America | Saint Barthélemy | Gustavia |
+---------------+---------------------+--------------+
Unnesting object properties​
You can also use the unnest_depth
parameter to control automatic unnesting of objects from GraphQL responses.
This examples uses the GitHub stargazers endpoint:
from: graphql:https://api.github.com/graphql
name: stargazers
params:
graphql_auth_token: ${env:GITHUB_TOKEN}
unnest_depth: 2
json_pointer: /data/repository/stargazers/edges
graphql_query: |
{
repository(name: "spiceai", owner: "spiceai") {
id
name
stargazers(first: 100) {
edges {
node {
id
name
login
}
}
pageInfo {
hasNextPage
endCursor
}
}
}
}
If unnest_depth
is set to 0, or unspecified, object unnesting is disabled. When enabled, unnesting automatically moves nested fields to the parent level.
Without unnesting, stargazers data looks like this in a query:
sql> select node from stargazers limit 1;
+------------------------------------------------------------+
| node |
+------------------------------------------------------------+
| {id: MDQ6VXNlcjcwNzIw, login: ashtom, name: Thomas Dohmke} |
+------------------------------------------------------------+
With unnesting, these properties are automatically placed into their own columns:
sql> select node from stargazers limit 1;
+------------------+--------+---------------+
| id | login | name |
+------------------+--------+---------------+
| MDQ6VXNlcjcwNzIw | ashtom | Thomas Dohmke |
+------------------+--------+---------------+
Unnesting Duplicate Columns​
By default, the Spice Runtime will error when a duplicate column is detected during unnesting.
For example, this example spicepod.yml
query would fail due to name
fields:
from: graphql:https://localhost
name: stargazers
params:
unnest_depth: 2
json_pointer: /data/users
graphql_query: |
query {
users {
name
emergency_contact {
name
}
}
}
This example would fail with a runtime error:
WARN runtime: GraphQL Data Connector Error: Invalid object access. Column 'name' already exists in the object.
Avoid this error by using aliases in the query where possible. In the example above, a duplicate error was introduced from emergency_contact { name }
.
The example below uses a GraphQL alias to rename emergency_contact.name
as emergencyContactName
.
from: graphql:https://localhost
name: stargazers
params:
unnest_depth: 2
json_pointer: /data/people
graphql_query: |
query {
users {
name
emergency_contact {
emergencyContactName: name
}
}
}