Apache Spark Connector

Apache Spark as a connector for federated SQL query against a Spark Cluster using Spark Connect

Configuration

The Apache Spark Connector can be used in two ways: specifying a plaintext connection string using the spark_remote parameter or specifying a spark_remote secret. The connector will fail if both configurations are set.

Parameters

spark_remote: A spark remote connection URI

Auth

Spark clusters configured to accept authenticated requests should not set spark_remote as an inline dataset param, as it will contain sensitive data. For this case, use a secret named spark with key spark_remote.

Check Secrets Stores for more details.

Local
Env
Kubernetes
Keyring

spice login spark --spark_remote <spark-remote>

Learn more about File Secret Store.

SPICE_SECRET_SPARK_SPARK_REMOTE=<spark-remote> \
spice run

spicepod.yaml

version: v1beta1
kind: Spicepod
name: spice-app

secrets:
  store: env

# <...>

Learn more about Env Secret Store.

kubectl create secret generic spark \
  --from-literal=spark_remote='<spark-remote>'

spicepod.yaml

version: v1beta1
kind: Spicepod
name: spice-app

secrets:
  store: kubernetes

# <...>

Learn more about Kubernetes Secret Store.

Add new keychain entry (macOS), with user and password in JSON string

security add-generic-password -l "Spark Remote" \
-a spiced -s spice_secret_spark \
-w $(echo -n '{"spark_remote": "spark"}')

spicepod.yaml

version: v1beta1
kind: Spicepod
name: spice-app

secrets:
  store: keyring

# <...>

Learn more about Keyring Secret Store.

Example

datasets:
  - from: spark:spiceai.datasets.my_awesome_table
    name: my_table
    params:
      spark_remote: sc://localhost
    

Configuration​

Parameters​

Auth​

Example​

Configuration

Parameters

Auth

Example