Semantic Model

Semantic data models in Spice are defined using the datasets[*].columns configuration. These models provide structured and meaningful data representations, which are beneficial for both AI large language models (LLMs) and traditional data analysis.

Use-Cases

Large Language Models (LLMs)

The semantic model is automatically used by Spice Models as context to produce more accurate and context-aware AI responses.

Defining a Semantic Model

Semantic data models are defined within the spicepod.yaml file, specifically under the datasets section. Each dataset supports description, metadata, and a columns field where individual columns are described with metadata and features for utility and clarity.

Example Configuration

Example spicepod.yaml:

datasets:
  - name: taxi_trips
    description: NYC taxi trip rides
    metadata:
      instructions: Always provide citations with reference URLs.
      reference_url_template: https://d37ci6vzurychx.cloudfront.net/trip-data/yellow_tripdata_<YYYY-MM>.parquet
    columns:
      - name: tpep_pickup_time
        description: 'The time the passenger was picked up by the taxi'
      - name: notes
        description: 'Optional notes about the trip'
        embeddings:
          - from: hf_minilm # A defined Spice Model
            chunking:
              enabled: true
              target_chunk_size: 512
              overlap_size: 128
              trim_whitespace: true

Dataset Metadata

Datasets can be defined with the following metadata:

instructions: Optional. Instructions to provide to a language model when using this dataset.
reference_url_template: Optional. A URL template for citation links.

For detailed metadata configuration, see the Dataset Reference

Column Definitions

Each column in the dataset can be defined with the following attributes:

description: Optional. A description of the column's contents and purpose.
embeddings: Optional. Vector embeddings configuration for this column.

For detailed columns configuration, see the Dataset Reference

Use-Cases​

Large Language Models (LLMs)​

Defining a Semantic Model​

Example Configuration​

Dataset Metadata​

Column Definitions​