Skip to main content

GitHub Data Connector

The GitHub Data Connector enables federated SQL queries on various GitHub resources such as files, issues, pull requests, and commits by specifying github as the selector in the from value for the dataset.

Common Configuration​

The GitHub data connector can be configured by providing the following params. Use the secret replacement syntax to load the access token from a secret store, e.g. ${secrets:GITHUB_TOKEN}.

  • github_token: Required. GitHub personal access token to use to connect to the GitHub API. Learn more.
  • owner - Required. Specifies the owner of the GitHub repository.
  • repo - Required. Specifies the name of the GitHub repository.

Querying GitHub Files​

  • ref - Required. Specifies the GitHub branch or tag to fetch files from.
  • include - Optional. Specifies a pattern to include specific files. Supports glob patterns. If not specified, all files are included by default.
datasets:
- from: github:github.com/<owner>/<repo>/files/<ref>
name: spiceai.files
params:
github_token: ${secrets:GITHUB_TOKEN}
include: "**/*.json; **/*.yaml"
acceleration:
enabled: true

Schema:​

Column NameData TypeIs Nullable
nameUtf8YES
pathUtf8YES
sizeInt64YES
shaUtf8YES
modeUtf8YES
urlUtf8YES
download_urlUtf8YES
contentUtf8YES
Limitations
  • content column is included only when acceleration is enabled.

Example​

datasets:
- from: github:github.com/spiceai/spiceai/files/v0.17.2-beta
name: spiceai.files
params:
github_token: ${secrets:GITHUB_TOKEN}
include: "**/*.txt" # include txt files only
acceleration:
enabled: true
sql> select * from spiceai.files
+-------------+-------------+------+------------------------------------------+--------+-------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+-------------+
| name | path | size | sha | mode | url | download_url | content |
+-------------+-------------+------+------------------------------------------+--------+-------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+-------------+
| version.txt | version.txt | 12 | ee80f747038c30e776eecb2c2ae155dec9a68187 | 100644 | https://api.github.com/repos/spiceai/spiceai/git/blobs/ee80f747038c30e776eecb2c2ae155dec9a68187 | https://raw.githubusercontent.com/spiceai/spiceai/v0.17.2-beta/version.txt | 0.17.2-beta |
| | | | | | | | |
+-------------+-------------+------+------------------------------------------+--------+-------------------------------------------------------------------------------------------------+----------------------------------------------------------------------------+-------------+

Time: 0.005067 seconds. 1 rows.

Querying GitHub Issues​

datasets:
- from: github:github.com/<owner>/<repo>/issues
name: spiceai.issues
params:
github_token: ${secrets:GITHUB_TOKEN}
acceleration:
enabled: true

Schema​

Column NameData TypeIs Nullable
assigneesList(Utf8)YES
bodyUtf8YES
closed_atUtf8YES
commentsList(Struct)YES
created_atUtf8YES
idUtf8YES
labelsList(Utf8)YES
loginUtf8YES
milestone_idUtf8YES
milestone_titleUtf8YES
comments_countInt64YES
numberInt64YES
stateUtf8YES
titleUtf8YES
updated_atUtf8YES
urlUtf8YES

Example​

datasets:
- from: github:github.com/spiceai/spiceai/issues
name: spiceai.issues
params:
github_token: ${secrets:GITHUB_TOKEN}
sql> select title, state, labels from spiceai.issues where title like '%duckdb%'
+-----------------------------------------------------------------------------------------------------------+--------+----------------------+
| title | state | labels |
+-----------------------------------------------------------------------------------------------------------+--------+----------------------+
| Limitation documentation duckdb accelerator about nested struct and decimal256 | CLOSED | [kind/documentation] |
| Inconsistent duckdb connector params: `params.open` and `params.duckdb_file` | CLOSED | [kind/bug] |
| federation across multiple duckdb acceleration tables. | CLOSED | [] |
| Integration tests to cover "On Conflict" behaviors for duckdb accelerator | CLOSED | [kind/task] |
| Permission denied issue while using duckdb data connector with spice using HELM for Kubernetes deployment | CLOSED | [kind/bug] |
+-----------------------------------------------------------------------------------------------------------+--------+----------------------+

Time: 0.011877542 seconds. 5 rows.

Querying GitHub Pull Requests​

datasets:
- from: github:github.com/<owner>/<repo>/pulls
name: spiceai.pulls
params:
github_token: ${secrets:GITHUB_TOKEN}

Schema​

Column NameData TypeIs Nullable
additionsInt64YES
assigneesList(Utf8)YES
bodyUtf8YES
changed_filesInt64YES
closed_atUtf8YES
comments_countInt64YES
commits_countInt64YES
created_atUtf8YES
deletionsInt64YES
hashesList(Utf8)YES
idUtf8YES
labelsList(Utf8)YES
loginUtf8YES
merged_atUtf8YES
numberInt64YES
reviews_countInt64YES
stateUtf8YES
titleUtf8YES
urlUtf8YES

Example​

datasets:
- from: github:github.com/spiceai/spiceai/pulls
name: spiceai.pulls
params:
github_token: ${secrets:GITHUB_TOKEN}
acceleration:
enabled: true
sql> select title, url, state from spiceai.pulls where title like '%GitHub connector%'
+---------------------------------------------------------------------+----------------------------------------------+--------+
| title | url | state |
+---------------------------------------------------------------------+----------------------------------------------+--------+
| GitHub connector: convert `labels` and `hashes` to primitive arrays | https://github.com/spiceai/spiceai/pull/2452 | MERGED |
+---------------------------------------------------------------------+----------------------------------------------+--------+

Time: 0.034996667 seconds. 1 rows.

Querying GitHub Commits​

datasets:
- from: github:github.com/<owner>/<repo>/commits
name: spiceai.commits
params:
github_token: ${secrets:GITHUB_TOKEN}

Schema​

Column NameData TypeIs Nullable
additionsInt64YES
author_emailUtf8YES
author_nameUtf8YES
committed_dateUtf8YES
deletionsInt64YES
idUtf8YES
messageUtf8YES
message_bodyUtf8YES
message_head_lineUtf8YES
shaUtf8YES

Example​

datasets:
- from: github:github.com/spiceai/spiceai/commits
name: spiceai.commits
params:
github_token: ${secrets:GITHUB_TOKEN}
acceleration:
enabled: true
sql> select sha, message_head_line from spiceai.commits limit 10
+------------------------------------------+------------------------------------------------------------------------+
| sha | message_head_line |
+------------------------------------------+------------------------------------------------------------------------+
| 2a9fab7905737e1af182e17f40aecc5c4b5dd236 | wait 2 seconds for the status to turn ready in refreshing status tes… |
| b9c210a818abeaf14d2493fde5227781f47faed8 | Update README.md - Remove bigquery from tablet of connectors (#1434) |
| d61e1af61ebf826f83703b8dd939f19e8b2ba426 | Add databricks_use_ssl parameter (#1406) |
| f1ec55c5986e3e5d57eff94197182ffebbae1045 | wording and logs change reflected on readme (#1435) |
| bfc74185584d1e048ef66c72ce3572a0b652bfd9 | Update acknowledgements (#1433) |
| 0d870f1791d456e7924b4ecbbda5f3b762db1e32 | Update helm version and use v0.13.0-alpha (#1436) |
| 12f930cbad69833077bd97ea43599a75cff985fc | Enable push-down federation by default (#1429) |
| 6e4521090aaf39664bd61d245581d34398ce77db | Add functional tests for federation push-down (#1428) |
| fa3279b7d9fcaa5e8baaa2425f69b556bb30e309 | Add LRU cache support for http-based sql queries (#1410) |
| a3f93dde9d1312bfbf14f7ae3b75bdc468289212 | Add guides and examples about error handling (#1427) |
+------------------------------------------+------------------------------------------------------------------------+

Time: 0.0065395 seconds. 10 rows.