The Spice.ai engine learns and provides recommendations to your application using a type of AI called deep reinforcement learning. To learn more, see Deep Learning AI.
A fundamental concept in deep reinforcement learning is to reward actions a learning agent takes during a training run. These rewards are numerical values and can be negative or positive.
In Spice.ai, developers define the rewards the AI engine uses in training runs through reward function definitions. Reward functions are Python functions (with more languages supported in the future) and can be authored either inline in the Spicepod manifest YAML or a separate Python .py
file.
To see how to define reward functions using an external file, click here.
To define the reward functions in the YAML directly, put the Python code fragment in the with
node.
The reward function must assign a value to reward
for it to be valid.
The following variables are available to be used in the reward function:
variable | Type | Description |
---|---|---|
current_state | dict | The observation state when the action was taken |
next_state | dict | The observation state one granularity step after the action was taken |
See the full example manifest here.
training:
rewards:
- reward: close_valve
# Reward keeping moisture content above 25%
with: |
if next_state["sensors_garden_moisture"] > 0.25:
reward = 200
# Penalize low moisture content depending on how far the garden has dried out
else:
reward = -100 * (0.25 - next_state["sensors_garden_moisture"])
# Penalize especially heavily if the drying trend is continuing (next_state is drier than current_state)
if next_state["sensors_garden_moisture"] < current_state["sensors_garden_moisture"]:
reward = reward * 2
- reward: open_valve_half
# Reward watering when needed, more heavily if the garden is more dried out
with: |
if next_state["sensors_garden_moisture"] < 0.25:
reward = 100 * (0.25 - next_state["sensors_garden_moisture"])
# Penalize wasting water
# Penalize overwatering depending on how overwatered the garden is
else:
reward = -50 * (next_state["sensors_garden_moisture"] - 0.25)
- reward: open_valve_full
# Reward watering when needed, more heavily if the garden is more dried out
with: |
if next_state["sensors_garden_moisture"] < 0.25:
reward = 200 * (0.25 - next_state["sensors_garden_moisture")
# Penalize wasting water more heavily with valve fully open
# Penalize overwatering depending on how overwatered the garden is
else:
reward = -100 * (next_state["sensors_garden_moisture"] - 0.25)