Skip to main content

Data Connectors

Data Connectors provide connections to databases, data warehouses, and data lakes for federated SQL queries and data replication.

Currently supported Data Connectors include:

NameDescriptionStatusProtocol/FormatRefresh ModesSupports IngestionSupports Documents
abfsAzure BlobFSAlphaParquet, CSVappend, fullRoadmap
clickhouseClickhouseAlphaappend, full
databricksDatabricksBetaSpark Connect
S3 / Delta Lake
append, fullRoadmap
debeziumDebeziumAlphaCDC, Kafkaappend, full, changes
delta_lakeDelta LakeBetaDelta Lakeappend, fullRoadmap
dremioDremioAlphaArrow Flight SQLappend, full
fileFileAlphaParquet, CSVappend, fullRoadmap
flightsqlFlightSQLBetaArrow Flight SQLappend, full
ftp, sftpFTP/SFTPAlphaParquet, CSVappend, full
githubGitHubBetaGraphQL, RESTappend, full
graphqlGraphQLAlphaGraphQLappend, full
http, httpsHTTP(s)AlphaParquet, CSVappend, full
mssqlMS SQL ServerAlphaTabular Data Stream (TDS)append, full
mysqlMySQLBetaappend, fullRoadmap
odbcODBCBetaappend, full
postgresPostgreSQLBetaappend, fullRoadmap
s3S3BetaParquet, CSVappend, fullRoadmap
sharepointSharePointAlphaappend, full
snowflakeSnowflakeAlphaArrowappend, fullRoadmap
spiceaiSpice.aiBetaArrow Flightappend, full
sparkSparkAlphaSpark Connectappend, full

Object Store File Formats

For data connectors that are object store compatible, if a folder is provided, the file format must be specified with params.file_format.

If a file is provided, the file format will be inferred, and params.file_format is unnecessary.

File formats currently supported are:

NameParameterSupportedIs Document Format
Apache Parquetfile_format: parquet
CSVfile_format: csv
Apache Icebergfile_format: icebergRoadmap
JSONfile_format: jsonRoadmap
Microsoft Excelfile_format: xlsxRoadmap
Markdownfile_format: md
Textfile_format: txt
PDFfile_format: pdfAlpha
Microsoft Wordfile_format: docxAlpha

File formats support additional parameters in the params (like csv_has_header) described in File Formats

If a format is a document format, each file will be treated as a document, as per document support below.

Note

Document formats in Alpha (e.g. pdf, docx) may not parse all structure or text from the underlying documents correctly.

Document Support

If a Data Connector supports documents, when the appropriate file format is specified (see above), each file will be treated as a row in the table, with the contents of the file within the content column. Additional columns will exist, dependent on the data connector.

Example

Consider a local filesystem

>>> ls -la
total 232
drwxr-sr-x@ 22 jeadie staff 704 30 Jul 13:12 .
drwxr-sr-x@ 18 jeadie staff 576 30 Jul 13:12 ..
-rw-r--r--@ 1 jeadie staff 1329 15 Jan 2024 DR-000-Template.md
-rw-r--r--@ 1 jeadie staff 4966 11 Aug 2023 DR-001-Dremio-Architecture.md
-rw-r--r--@ 1 jeadie staff 2307 28 Jul 2023 DR-002-Data-Completeness.md

And the spicepod

datasets:
- name: my_documents
from: file:docs/decisions/
params:
file_format: md

A Document table will be created.

>>> SELECT * FROM my_documents LIMIT 3
+----------------------------------------------------+--------------------------------------------------+
| location | content |
+----------------------------------------------------+--------------------------------------------------+
| Users/docs/decisions/DR-000-Template.md | # DR-000: DR Template |
| | **Date:** <> |
| | **Decision Makers:** |
| | - @<> |
| | - @<> |
| | ... |
| Users/docs/decisions/DR-001-Dremio-Architecture.md | # DR-001: Add "Cached" Dremio Dataset |
| | |
| | ## Context |
| | |
| | We use [Dremio](https://www.dremio.com/) to p... |
| Users/docs/decisions/DR-002-Data-Completeness.md | # DR-002: Append-Only Data Completeness |
| | |
| | ## Context |
| | |
| | Our Ethereum append-only dataset is incomple... |
+----------------------------------------------------+--------------------------------------------------+

Data Connector Docs