Skip to main content

S3 Data Connector

The S3 Data Connector enables federated SQL query across Parquet files stored in S3, or S3-compatible storage solutions (e.g. MinIO, Cloudflare R2).

Support for Iceberg, CSV, and other file-formats are on the roadmap.

If a folder is provided, all child Parquet files will be loaded.

Dataset Schema Reference

from

The S3-compatible URI to a folder or object in form from: s3://<bucket>/<file>

Example: from: s3://s3-bucket-name/path/to/parquet/cool_dataset.parquet

name

The dataset name.

Example: name: cool_dataset

params (optional)

  • endpoint: The S3 endpoint, or equivalent (e.g. MinIO endpoint), for the S3-compatible storage. E.g. endpoint: https://my.minio.server
  • region: Region of the S3 bucket, if region specific. E.g. region: us-east-1

Auth

Not required for public endpoints.

  • key: The access key (e.g. AWS_ACCESS_KEY_ID for AWS)
  • secretThe secret key (e.g. AWS_SECRET_ACCESS_KEY for AWS)

For endpoints protected by access keys, key and secret are required and must be passed using a Secrets Store or via spice s3 login. By default S3 connector will look for a secret named s3 with keys key and secret.

Support for dataset specific authentication is on the roadmap.

spice login s3 -k AKIAIOSFODNN7EXAMPLE -s wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

Learn more about File Secret Store.

Examples

MinIO Example

Create a dataset named cool_dataset from a Parquet file stored in MinIO.

- from: s3://s3-bucket-name/path/to/parquet/cool_dataset.parquet
name: cool_dataset
params:
endpoint: https://my.minio.server
region: 'us-east-1' # Best practice for Minio

S3 Authenticated Example

Create a dataset named cool_dataset from a protected Parquet file stored in S3.

First, log in using spice login s3 -k AKIAIOSFODNN7EXAMPLE -s wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY then use the dataset:

- from: s3://my-startups-data/path/to/parquet/cool_dataset.parquet
name: cool_dataset
params:
region: 'us-east-1'

S3 Public Example

Create a dataset named taxi_trips from a public S3 folder.

- from: s3://spiceai-demo-datasets/taxi_trips/2024/
name: taxi_trips

Limitations

  • Automatic discovery of nested paths are not yet supported. Only Parquet files in the root of the endpoint path will be loaded.

  • Only S3 authentication using aws_access_key_id and aws_secret_access_key is currently supported. Please submit an issue if you would like to see support for other authentication methods.