Skip to main content

Apache Spark Connector

Apache Spark as a connector for federated SQL query against a Spark Cluster using Spark Connect

Configuration

The Apache Spark Connector can be used in two ways: specifying a plaintext connection string using the spark_remote parameter or specifying a spark_remote secret. The connector will fail if both configurations are set.

Parameters

Auth

Spark clusters configured to accept authenticated requests should not set spark_remote as an inline dataset param, as it will contain sensitive data. For this case, use a secret named spark with key spark_remote.

Check Secrets Stores for more details.

spice login spark --spark_remote <spark-remote>

Learn more about File Secret Store.

Example

datasets:
- from: spark:spiceai.datasets.my_awesome_table
name: my_table
params:
spark_remote: sc://localhost