Deep RL Bootcamp - Hackathon

Results from Pascal Sager

Home
Procedure
Algorithms
Results
Conclusion
Presentation

Results

Comparison of Different Algorithms

My focus was to win the game as quickly as possible. I measured the time and not the number of episodes. The episodes vary depending on the algorithm and therefore are not relevant. For example, DQN takes many episodes even though the game is won in 10 minutes. But this is because I wait a little longer to fill the buffer until I start learning.

Results LunarLander-v2

Won = mean return > 200 for 100 epochs min.

Network	Duration until Won	Max. mean Score
DQN	10min 41s (GTX 1060)	268
PPO2	22min 15s (Tesla T4)	245
A2C	4min 40s (GTX 1060)	226

Results LunarLanderContinuous-v2

Network	Duration until Won	Max. mean Score
DDPG	Not won yet!	0

Results of the different Algorithms:

Results DQN
Results PPO
Results A2C
Results DDPG

Navigation

Results

Comparison of Different Algorithms

Results LunarLander-v2

Results LunarLanderContinuous-v2

Results of the different Algorithms: