DuckDB Data Connector
Dataset Source​
To connect to a DuckDB persistent database as a data source, specify duckdb
as the selector in the from
value for the dataset.
datasets:
- from: duckdb:database.schema.table
name: my_dataset
params:
duckdb_open: path/to/duckdb_file.duckdb
Configuration​
The DuckDB data connector can be configured by providing the following params
:
duckdb_open
: The name for the file to back the DuckDB database.
Configuration params
are provided either in the top level dataset
for a dataset source, or in the acceleration
section for a data store.
A generic example of DuckDB data connector configuration.
datasets:
- from: duckdb:database.schema.table
name: my_dataset
params:
duckdb_open: path/to/duckdb_file.duckdb
This example uses a DuckDB database file that is at location /my/path/
datasets:
- from: duckdb:sample_data.nyc.rideshare
name: nyc_rideshare
params:
duckdb_open: /my/path/my_database.db
DuckDB Functions​
Common data import DuckDB functions can also define datasets. Instead of a fixed table reference (e.g. database.schema.table
), a DuckDB function is provided in the from:
key. For example
datasets:
- from: duckdb:database.schema.table
name: my_dataset
params:
duckdb_open: path/to/duckdb_file.duckdb
- from: duckdb:read_csv('test.csv', header = false)
name: from_function
Datasets created from DuckDB functions are similar to a standard SELECT
query. For example:
datasets:
- from: duckdb:read_csv('test.csv', header = false)
is equivalent to:
-- from_function
SELECT * FROM read_csv('test.csv', header = false)
Many DuckDB data imports can be rewritten as DuckDB functions, making them usable as Spice datasets. For example:
SELECT * FROM 'todos.json';
-- As a DuckDB function
SELECT * FROM read_json('todos.json');