Skip to main content

Data Connectors

Data Connectors provide connections to databases, data warehouses, and data lakes for federated SQL queries and data replication.

Currently supported Data Connectors include:

NameDescriptionStatusProtocol/FormatRefresh ModesSupports IngestionSupports Documents
githubGitHubRelease CandidateGraphQL, RESTappend, full
mysqlMySQLRelease Candidateappend, fullRoadmap
postgresPostgreSQLRelease Candidateappend, fullRoadmap
s3S3Release CandidateParquet, CSVappend, fullRoadmap
databricksDatabricksBetaSpark Connect
S3 / Delta Lake
append, fullRoadmap
delta_lakeDelta LakeBetaDelta Lakeappend, fullRoadmap
flightsqlFlightSQLBetaArrow Flight SQLappend, full
odbcODBCBetaappend, full
spiceaiSpice.aiBetaArrow Flightappend, full
abfsAzure BlobFSAlphaParquet, CSVappend, fullRoadmap
clickhouseClickhouseAlphaappend, full
debeziumDebeziumAlphaCDC, Kafkaappend, full, changes
dremioDremioAlphaArrow Flight SQLappend, full
duckdbDuckDBAlphaappend, full
fileFileAlphaParquet, CSVappend, fullRoadmap
ftp, sftpFTP/SFTPAlphaParquet, CSVappend, full
graphqlGraphQLAlphaGraphQLappend, full
http, httpsHTTP(s)AlphaParquet, CSVappend, full
localpodLocal dataset replicationAlphaappend, full
mssqlMS SQL ServerAlphaTabular Data Stream (TDS)append, full
sharepointSharePointAlphaappend, full
snowflakeSnowflakeAlphaArrowappend, fullRoadmap
sparkSparkAlphaSpark Connectappend, full

Object Store File Formats

For data connectors that are object store compatible, if a folder is provided, the file format must be specified with params.file_format.

If a file is provided, the file format will be inferred, and params.file_format is unnecessary.

File formats currently supported are:

NameParameterSupportedIs Document Format
Apache Parquetfile_format: parquet
CSVfile_format: csv
Apache Icebergfile_format: icebergRoadmap
JSONfile_format: jsonRoadmap
Microsoft Excelfile_format: xlsxRoadmap
Markdownfile_format: md
Textfile_format: txt
PDFfile_format: pdfAlpha
Microsoft Wordfile_format: docxAlpha

File formats support additional parameters in the params (like csv_has_header) described in File Formats

If a format is a document format, each file will be treated as a document, as per document support below.

Note

Document formats in Alpha (e.g. pdf, docx) may not parse all structure or text from the underlying documents correctly.

Document Support

If a Data Connector supports documents, when the appropriate file format is specified (see above), each file will be treated as a row in the table, with the contents of the file within the content column. Additional columns will exist, dependent on the data connector.

Example

Consider a local filesystem

>>> ls -la
total 232
drwxr-sr-x@ 22 jeadie staff 704 30 Jul 13:12 .
drwxr-sr-x@ 18 jeadie staff 576 30 Jul 13:12 ..
-rw-r--r--@ 1 jeadie staff 1329 15 Jan 2024 DR-000-Template.md
-rw-r--r--@ 1 jeadie staff 4966 11 Aug 2023 DR-001-Dremio-Architecture.md
-rw-r--r--@ 1 jeadie staff 2307 28 Jul 2023 DR-002-Data-Completeness.md

And the spicepod

datasets:
- name: my_documents
from: file:docs/decisions/
params:
file_format: md

A Document table will be created.

>>> SELECT * FROM my_documents LIMIT 3
+----------------------------------------------------+--------------------------------------------------+
| location | content |
+----------------------------------------------------+--------------------------------------------------+
| Users/docs/decisions/DR-000-Template.md | # DR-000: DR Template |
| | **Date:** <> |
| | **Decision Makers:** |
| | - @<> |
| | - @<> |
| | ... |
| Users/docs/decisions/DR-001-Dremio-Architecture.md | # DR-001: Add "Cached" Dremio Dataset |
| | |
| | ## Context |
| | |
| | We use [Dremio](https://www.dremio.com/) to p... |
| Users/docs/decisions/DR-002-Data-Completeness.md | # DR-002: Append-Only Data Completeness |
| | |
| | ## Context |
| | |
| | Our Ethereum append-only dataset is incomple... |
+----------------------------------------------------+--------------------------------------------------+

Data Connector Docs