You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introduction
My model acts like a compulsive masochist. Great beginning, innit? I will attach my parameters a bit further in a text but don't strictly orient on them bc I'm changing them all the time because of the following:
Describe the bug
I have a very simple ping-pong env (custom one, not gym) and I sat up an agent without any issues except, probably, one. Potential problem is in the reward system but nevertheless it shouldn't act like it does. My reward system bases on a simple
if not done:
reward = 1
else:
reward = 0
and probably he should try to get as many reward points as possible and it does so but only in the first 10k steps. Neither of the parameters affects on this occasion. Ofc hyperparams changes its performance but nothing more. After 10k it starts to dodge a ball but sometimes it gets about 5-10 points but dodges a 100 episodes afterwards. Code example
I will throw everything important (imo) in a single logical sequence but i can invite in repo if needed. rew_mean looks like this. As you can see, it smashes after 10k. Btw, after learning starts parameter it smashes even lower and I don't know how's that even possible. Here's one more graph.
framebuffer=5learning_rate=0.0001total_timesteps=10000000# something like the infinity. I have a callback each 5k steps.env=PingPongEnv()
env=DummyVecEnv([lambda: env])
env=VecTransposeImage(env)
env=VecFrameStack(env, n_stack=framebuffer)
model=DQN('CnnPolicy', env, verbose=1, tau=0.001, tensorboard_log=LOG_DIR,
learning_rate=learning_rate, buffer_size=10000, learning_starts=100000,
train_freq=1000, target_update_interval=20000, exploration_inital_eps=1,
exploration_final_eps=0.00001, explorationfraction=0.001)
System Info
Describe the characteristic of your environment:
As far as I'm using hell lotta libraries for a single purpose, I can't write about each and every, but globally I'm using conda when available and pip when conda is unable to find required packages
I have a singe GPU. GTX 1060 6G but it's utilized about 10-15% and mem usage is around 3-4 gigs. Ram is also not overfitted as well as cpu and disks (just in case).
Python 3.10.10, conda = 23.1.0, latest at the moment.
I'm not using tensorflow so it's not even installed. pytorch is 2.0.0, latest stable at the moment.
Additional context
Ping-pong is written on arcade by my brother but I'm not sure if it's useful info bc I'm not diving into his code, I use direct input instead.
I use win32gui to grab images but it gives back about 150-200 images per second so its definitely not the problem.
The text was updated successfully, but these errors were encountered:
Introduction
My model acts like a compulsive masochist. Great beginning, innit? I will attach my parameters a bit further in a text but don't strictly orient on them bc I'm changing them all the time because of the following:
Describe the bug
I have a very simple ping-pong env (custom one, not gym) and I sat up an agent without any issues except, probably, one. Potential problem is in the reward system but nevertheless it shouldn't act like it does. My reward system bases on a simple
and probably he should try to get as many reward points as possible and it does so but only in the first 10k steps. Neither of the parameters affects on this occasion. Ofc hyperparams changes its performance but nothing more. After 10k it starts to dodge a ball but sometimes it gets about 5-10 points but dodges a 100 episodes afterwards.
Code example
I will throw everything important (imo) in a single logical sequence but i can invite in repo if needed. rew_mean looks like this. As you can see, it smashes after 10k. Btw, after learning starts parameter it smashes even lower and I don't know how's that even possible. Here's one more graph.
System Info
Describe the characteristic of your environment:
Additional context
Ping-pong is written on arcade by my brother but I'm not sure if it's useful info bc I'm not diving into his code, I use direct input instead.
I use win32gui to grab images but it gives back about 150-200 images per second so its definitely not the problem.
The text was updated successfully, but these errors were encountered: