Navigation
Results
Comparison of Different Algorithms
My focus was to win the game as quickly as possible. I measured the time and not the number of episodes. The episodes vary depending on the algorithm and therefore are not relevant. For example, DQN takes many episodes even though the game is won in 10 minutes. But this is because I wait a little longer to fill the buffer until I start learning.
Results LunarLander-v2
Won = mean return > 200 for 100 epochs min.
Network | Duration until Won | Max. mean Score |
---|---|---|
DQN | 10min 41s (GTX 1060) | 268 |
PPO2 | 22min 15s (Tesla T4) | 245 |
A2C | 4min 40s (GTX 1060) | 226 |
Results LunarLanderContinuous-v2
Network | Duration until Won | Max. mean Score |
---|---|---|
DDPG | Not won yet! | 0 |