This project is an implementation of reinforcement learning algorithms in the Unity environment. Twin Delayed DDPG (TD3) and Soft Actor-Critic (SAC) are implemented with the ml-agents Python API and Tensorflow 2.
The objective of this agent is to learn to balance the ball on its head for as long as possible. The agent receives a positive reward when the ball remains on its head for every step and receives a negative reward when the ball falls off. There are 2 continuous actions, one for X-rotation and the other for Z-rotation.
In the beginning, the heads cannot balance the ball. SAC uses a stochastic policy and the policy network starts with random parameters. Actions are sampled during training and auto-tuning of the entropy scale is implemented.
After some training, the policy is able to balance the ball fairly well but still drops the ball every so often.
After more training, no ball is dropped and maximum reward is achieved during evaluation. However, the ball sits at the corner instead of the center.
Finally, the ball sits in the center of the head. The stochastic policy adds noise and causes the ball to drop when it is near the corner during training time, so the agent learns to move the ball to the center.
The objective of the agent is to move towards the goal direction without falling. The agent receives a positive reward if the body velocity matches goal velocity and receives a negative reward when it falls. There are 20 continuous actions, corresponding to target rotations and strengths for joints.
More coming soon!