Skip to main content

DuckDB Data Connector

Dataset Source​

To connect to a DuckDB persistent database as a data source, specify duckdb as the selector in the from value for the dataset.

datasets:
- from: duckdb:database.schema.table
name: my_dataset
params:
duckdb_open: path/to/duckdb_file.duckdb

Configuration​

The DuckDB data connector can be configured by providing the following params:

  • duckdb_open: The name for the file to back the DuckDB database.

Configuration params are provided either in the top level dataset for a dataset source, or in the acceleration section for a data store.

A generic example of DuckDB data connector configuration.

datasets:
- from: duckdb:database.schema.table
name: my_dataset
params:
duckdb_open: path/to/duckdb_file.duckdb

This example uses a DuckDB database file that is at location /my/path/

datasets:
- from: duckdb:sample_data.nyc.rideshare
name: nyc_rideshare
params:
duckdb_open: /my/path/my_database.db

DuckDB Functions​

Common data import DuckDB functions can also define datasets. Instead of a fixed table reference (e.g. database.schema.table), a DuckDB function is provided in the from: key. For example

datasets:
- from: duckdb:database.schema.table
name: my_dataset
params:
duckdb_open: path/to/duckdb_file.duckdb

- from: duckdb:read_csv('test.csv', header = false)
name: from_function

Datasets created from DuckDB functions are similar to a standard SELECT query. For example:

datasets:
- from: duckdb:read_csv('test.csv', header = false)

is equivalent to:

-- from_function
SELECT * FROM read_csv('test.csv', header = false)

Many DuckDB data imports can be rewritten as DuckDB functions, making them usable as Spice datasets. For example:

SELECT * FROM 'todos.json';

-- As a DuckDB function
SELECT * FROM read_json('todos.json');