Data Accelerators
Data sourced by Data Connectors can be locally materialized and accelerated using a Data Accelerator.
A Data Accelerator will query/fetch data from a connected data source and store/update it locally in an embedded acceleration engine, such as DuckDB or SQLite. To set data refresh behavior, such as refreshing data on an interval see Data Refresh.
Dataset acceleration is enabled by setting the acceleration configuration. E.g.
datasets:
- name: accelerated_dataset
acceleration:
enabled: true
For the complete reference specification see datasets.
By default, datasets will be locally materialized using in-memory Arrow records.
A choice of DuckDB, SQLite, or PostgreSQL engines can be used to materialize data, in-memory, on disk, or in attached databases.
Supported Data Accelerators include:
Engine Name | Description | Status | Engine Modes |
---|---|---|---|
arrow | In-Memory Arrow Records | Alpha | memory |
duckdb | Embedded DuckDB | Alpha | memory , file |
sqlite | Embedded SQLite | Alpha | memory , file |
postgres | Attached PostgreSQL | Beta |
Data Types
Data Accelerators may not support all possible Apache Arrow data types. For complete compatibility, see specifications.
When accelerating a dataset using mode: memory
(the default), some or all of the dataset is loaded into memory. Ensure sufficient memory is available, including overhead for queries and the runtime, especially with concurrent queries.
In-memory limitations can be mitigated by storing acceleration data on disk, which is supported by duckdb
and sqlite
accelerators by specifying mode: file
.
Data Accelerator Docs
📄️ In-Memory Arrow Data Accelerator
In-Memory Arrow Data Accelerator Documentation
📄️ Data Refresh
Data refresh for accelerated datasets
📄️ DuckDB Data Accelerator
DuckDB Data Accelerator Documentation
📄️ SQLite Data Accelerator
SQLite Data Accelerator Documentation
📄️ PostgreSQL Data Accelerator
PostgreSQL Data Accelerator Documentation