Apache Spark Connector
Apache Spark as a connector for federated SQL query against a Spark Cluster using Spark Connect
Configuration​
The Apache Spark Connector can be used in two ways: specifying a plaintext connection string using the spark_remote
parameter or specifying a spark_remote
secret. The connector will fail if both configurations are set.
Parameters​
spark_remote
: A spark remote connection URI
Auth​
Spark clusters configured to accept authenticated requests should not set spark_remote
as an inline dataset param, as it will contain sensitive data. For this case, use a secret named spark
with key spark_remote
.
Check Secrets Stores for more details.
- Local
- Env
- Kubernetes
- Keyring
spice login spark --spark_remote <spark-remote>
Learn more about File Secret Store.
SPICE_SECRET_SPARK_SPARK_REMOTE=<spark-remote> \
spice run
spicepod.yaml
version: v1beta1
kind: Spicepod
name: spice-app
secrets:
store: env
# <...>
Learn more about Env Secret Store.
kubectl create secret generic spark \
--from-literal=spark_remote='<spark-remote>'
spicepod.yaml
version: v1beta1
kind: Spicepod
name: spice-app
secrets:
store: kubernetes
# <...>
Learn more about Kubernetes Secret Store.
Add new keychain entry (macOS), with user and password in JSON string
security add-generic-password -l "Spark Remote" \
-a spiced -s spice_secret_spark \
-w $(echo -n '{"spark_remote": "spark"}')
spicepod.yaml
version: v1beta1
kind: Spicepod
name: spice-app
secrets:
store: keyring
# <...>
Learn more about Keyring Secret Store.
Example​
datasets:
- from: spark:spiceai.datasets.my_awesome_table
name: my_table
params:
spark_remote: sc://localhost