3 Predicting the Best States and Actions: Deep Q-Networks

This chapter covers:

Implementing the Q-function as a neural network
Building a Deep Q-network using PyTorch to play Gridworld
Counteracting “catastrophic forgetting” with experience replay
Improving learning stability with target networks

3.1 The Q-function

In this chapter we start off where the deep reinforcement learning revolution began: DeepMind’s Deep Q-networks that learned to play Atari games. Although we won’t be using Atari games as our testbed quite yet, we will be building virtually the same system DeepMind did. We’ll be using a simple console-based game called Gridworld as our game environment. Gridworld is actually a family of similar games, but they all generally involve a grid board with a player (or agent), an objective tile (the “goal”), and possibly one or more special tiles that may be barriers or may grant negative or positive rewards. The player can move up, down, left or right and the point of the game is to get the player to the goal tile where the player will receive a positive reward. The player must not only reach the goal tile but must do so following the shortest path and may need to navigate through various obstacles.

We will be using a very simple Gridworld engine included in the GitHub repository for this book, which you can download at < http://github.com/DeepReinforcementLearning/ > in the Chapter 3 folder.

3 Predicting the Best States and Actions: Deep Q-Networks

This chapter covers:

3.1 The Q-function

3.2 Navigating with Q-learning

3.3 Preventing Catastrophic Forgetting: Experience Replay

3.4 Improving Stability with a Target Network

3.5 Summary

3.6 What’s next?