In this work, I seek to develop a reinforcement learning algorithm based solely on the screen image. The algorithm must be able to stay 60 seconds alive in the chosen environment. This environment is a game available on the Steam platform called Neon Drive.
Neon Drive is a slick retro-futuristic and '80s inspired arcade game. This game has a very simple purpose, to deviate from fixed obstacles over time using 3 types of discrete actions: left, right, and straight. In this case, Neon Drive will serve as the environment for our reinforcement learning algorithm. To use this algorithm you need to open the game and enter in 'endurance' mode. Let the car hit an obstacle, in the restart screen run the following command:
sudo python3 dqn.py --resolution 1920x1080 --train policy_net.pth
Then go back to the game screen and let the algorithm work. In case there's any doubt, you need to run with sudo because of the keyboard module.
Obs: If you have problems with terminal environment variables, please add -E after sudo.
All of the requirements are shown in the budgets above, but if you want to install all of them, enter the repository and execute the following line of code:
pip3 install -r requirements.txt
Neural networks can usually solve tasks just by looking at the location, so let's use a piece of the screen centered on the car as an input. By using the only image our task becomes much more difficult. Since we cannot render multiple environments at the same time, we need a lot of training time. Strictly speaking, we will present the state as the difference between the current screen patch and the previous one. This will allow the agent to take the velocity of the obstacles into account from one image.
Our model will be a convolutional neural network that takes in the difference between the current and previous screen patches. It has three outputs, representing , and where is the input to the network. In effect, the network is trying to predict the expected return of taking each action given the current input.
The image processing performed in this work is quite simple, but it is very important for the overall functioning of the algorithm. Through the mss module, the screen was captured and transformed into a NumPy array variable. With the BGR screen saved, we applied a color filter available in the OpenCV module to transform everything to grayscale. We cut 53.84% of the upper pixels, 20% of the lower pixels, and 20% of the left and right pixels. After that, we applied the triangle threshold function to transform the image to black and white. Finally, we resize the final image to 160x90 pixels using area interpolation and invert all of the binary pixels. You can follow the steps of this process in the following images:
In order, the respective images are normal input image, image converted to grayscale, cropped image with triangle threshold, and lastly the cropped image with triangle threshold and all the binary pixels inverted.
As the agent observes the current state of the environment and chooses an action, the environment transitions to a new state, and also returns a reward that indicates the consequences of the action. In this task, rewards are +1 for every incremental timestep and the environment terminates if the car hits the obstacle. This means better-performing scenarios will run for a longer duration, accumulating a larger return.
Our aim will be to train a policy that tries to maximize the discounted, cumulative reward , where is also known as the return. The discount, , should be a constant between 0 and 1 that ensures the sum converges. It makes rewards from the uncertain far future less important for our agent than the ones in the near future that it can be fairly confident about.
After the training is done a file called data.csv is generated. From this file, we can show some important information such as reward history, the number of steps per epochs, and noise interference on the data. To view your graphs run the following command:
python3 data_visualization.py --file data.csv
See more about the results in the YouTube video.
The following images show the graphics generated by the data.csv file already present in the repository.
If you liked this repository, please don't forget to starred it!