Apache Spark Connector
Apache Spark as a connector for federated SQL query against a Spark Cluster using Spark Connect
datasets:
- from: spark:spiceai.datasets.my_awesome_table
name: my_table
params:
spark_remote: sc://localhost:15002
Configuration​
spark_remote
: A spark remote connection URI. Refer to spark connect client connection string for parameters in URI.
Auth Examples​
Spark clusters configured to accept authenticated requests should not set spark_remote
as an inline dataset param, as it will contain sensitive data. For this case, use the secret replacement syntax to load the secret from a secret store, e.g. ${secrets:my_spark_remote}
.
Check Secrets Stores for more details.
- Env
- Kubernetes
- Keyring
SPICE_SPARK_REMOTE=<spark-remote> \
spice run
# Or using the CLI to configure the secrets into an `.env` file
spice login spark --spark_remote <spark-remote>
.env
SPICE_SPARK_REMOTE=<spark-remote>
spicepod.yaml
version: v1beta1
kind: Spicepod
name: spice-app
secrets:
- from: env
name: env
datasets:
- from: spark:spiceai.datasets.my_awesome_table
name: my_table
params:
spark_remote: ${env:SPICE_SPARK_REMOTE}
Learn more about Env Secret Store.
kubectl create secret generic spark \
--from-literal=spark_remote='<spark-remote>'
spicepod.yaml
version: v1beta1
kind: Spicepod
name: spice-app
secrets:
- from: kubernetes:spark
name: spark
datasets:
- from: spark:spiceai.datasets.my_awesome_table
name: my_table
params:
spark_remote: ${spark:spark_remote}
Learn more about Kubernetes Secret Store.
Add new keychain entry (macOS) with the spark remote:
security add-generic-password -l "Spark Remote" \
-a spiced -s spice_spark_remote \
-w <spark-remote>
spicepod.yaml
version: v1beta1
kind: Spicepod
name: spice-app
secrets:
- from: keyring
name: keyring
datasets:
- from: spark:spiceai.datasets.my_awesome_table
name: my_table
params:
spark_remote: ${keyring:spice_spark_remote}
Learn more about Keyring Secret Store.