Soft Actor-Critic implementation of the Soft Actor-Critic algorithm (SAC)

The SAC (Soft Actor-Critic) algorithm was developed in 2018. It is a off-policy, model-free reinforcement learning algorithm that aims not only at maximizing the reward but also the entropy (acting as randomly as possible). The entropy maximization helps exploring possibilities and trying actions that seems to be equally rewarding.

The implementation of Soft Actor-Critic has been modified to work for discrete action sets.

Berkeley AI Research blog: Arxiv paper: