S3 Data Connector
The S3 Data Connector enables federated SQL query across Parquet files stored in S3, or S3-compatible storage solutions (e.g. MinIO, Cloudflare R2).
Support for Iceberg, CSV, and other file-formats are on the roadmap.
If a folder is provided, all child Parquet files will be loaded.
Dataset Schema Reference​
from
​
The S3-compatible URI to a folder or object in form from: s3://<bucket>/<file>
Example: from: s3://s3-bucket-name/path/to/parquet/cool_dataset.parquet
name
​
The dataset name.
Example: name: cool_dataset
params
(optional)​
endpoint
: The S3 endpoint, or equivalent (e.g. MinIO endpoint), for the S3-compatible storage. E.g.endpoint: https://my.minio.server
region
: Region of the S3 bucket, if region specific. E.g.region: us-east-1
Auth​
Not required for public endpoints.
key
: The access key (e.g.AWS_ACCESS_KEY_ID
for AWS)secret
The secret key (e.g.AWS_SECRET_ACCESS_KEY
for AWS)
For endpoints protected by access keys, key
and secret
are required and must be passed using a Secrets Store or via spice s3 login
.
By default S3 connector will look for a secret named s3
with keys key
and secret
.
Support for dataset specific authentication is on the roadmap.
- Local
- Env
- Kubernetes
- Keyring
spice login s3 -k AKIAIOSFODNN7EXAMPLE -s wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Learn more about File Secret Store.
SPICE_SECRET_S3_KEY=AKIAIOSFODNN7EXAMPLE \
SPICE_SECRET_S3_SECRET=wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY \
spice run
spicepod.yaml
version: v1beta1
kind: Spicepod
name: spice-app
secrets:
store: env
# <...>
Learn more about Env Secret Store.
kubectl create secret generic s3 \
--from-literal=key='AKIAIOSFODNN7EXAMPLE' \
--from-literal=secret='wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY'
spicepod.yaml
version: v1beta1
kind: Spicepod
name: spice-app
secrets:
store: kubernetes
# <...>
Learn more about Kubernetes Secret Store.
Add new keychain entry (macOS), with key and secret in JSON string
security add-generic-secret -l "S3 Secret" \
-a spiced -s spice_secret_s3 \
-w $(echo -n '{"key": "AKIAIOSFODNN7EXAMPLE", "secret": "wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY"}')
spicepod.yaml
version: v1beta1
kind: Spicepod
name: spice-app
secrets:
store: keyring
# <...>
Learn more about Keyring Secret Store.
Examples​
MinIO Example​
Create a dataset named cool_dataset
from a Parquet file stored in MinIO.
- from: s3://s3-bucket-name/path/to/parquet/cool_dataset.parquet
name: cool_dataset
params:
endpoint: https://my.minio.server
region: 'us-east-1' # Best practice for Minio
S3 Authenticated Example​
Create a dataset named cool_dataset
from a protected Parquet file stored in S3.
First, log in using spice login s3 -k AKIAIOSFODNN7EXAMPLE -s wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
then use the dataset:
- from: s3://my-startups-data/path/to/parquet/cool_dataset.parquet
name: cool_dataset
params:
region: 'us-east-1'
S3 Public Example​
Create a dataset named taxi_trips
from a public S3 folder.
- from: s3://spiceai-demo-datasets/taxi_trips/2024/
name: taxi_trips
Limitations​
- Only S3 authentication using
aws_access_key_id
andaws_secret_access_key
is currently supported. Please submit an issue if you would like to see support for other authentication methods.