Semantic Model
Semantic data models in Spice are defined using the datasets[*].columns
configuration. These models provide structured and meaningful data representations, which are beneficial for both AI large language models (LLMs) and traditional data analysis.
Use-Cases​
Large Language Models (LLMs)​
The semantic model is automatically used by Spice Models as context to produce more accurate and context-aware AI responses.
Defining a Semantic Model​
Semantic data models are defined within the spicepod.yaml
file, specifically under the datasets
section. Each dataset supports description
, metadata
, and a columns
field where individual columns are described with metadata and features for utility and clarity.
Example Configuration​
Example spicepod.yaml
:
datasets:
- name: taxi_trips
description: NYC taxi trip rides
metadata:
instructions: Always provide citations with reference URLs.
reference_url_template: https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_<YYYY-MM>.parquet
columns:
- name: tpep_pickup_time
description: 'The time the passenger was picked up by the taxi'
- name: notes
description: 'Optional notes about the trip'
embeddings:
- from: hf_minilm # A defined Spice Model
chunking:
enabled: true
target_chunk_size: 512
overlap_size: 128
trim_whitespace: true
Dataset Metadata​
Datasets can be defined with the following metadata:
instructions
: Optional. Instructions to provide to a language model when using this dataset.reference_url_template
: Optional. A URL template for citation links.
For detailed metadata
configuration, see the Dataset Reference
Column Definitions​
Each column in the dataset can be defined with the following attributes:
description
: Optional. A description of the column's contents and purpose.embeddings
: Optional. Vector embeddings configuration for this column.
For detailed columns
configuration, see the Dataset Reference