-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Major Performance Decrease in Tianshou 1.2 Compared to 0.5 on Windows and Linux #1225
Comments
Since you are running on Windows: Do you have an Nvidia GPU which you expect to be using? If so, please check whether the GPU is indeed being used. Default CUDA support differs across different versions of torch (especially on Windows), so this is important to check. Also, are you using parallel environments? If so, which type of vectorization did you enable? |
Yes, I have an NVIDIA graphics card available. It is recognized by PyTorch both in Windows and Linux: print(f"Device: {device}")
print(f"Tianshou version: {tianshou.__version__}")
print(f"Torch version: {torch.__version__} and Cuda available: {torch.cuda.is_available()}") Windows Output Device: cuda
Tianshou version: 1.2.0-dev
Torch version: 2.5.0 and Cuda available: True Linux Output Device: cuda
Tianshou version: 1.2.0-dev
Torch version: 2.1.1+cu121 and Cuda available: True The GPU is effectively used in Tianshou version 0.5. This should also apply to Tianshou version 1.2. My training setup is similar to the API examples: # Models
net = Net(
state_shape,
hidden_sizes=NETWORK_ARCHITECTURE,
activation=nn.Tanh,
device=device)
actor = ActorProb(
net,
action_shape,
max_action=max_action,
unbounded=True,
device=device,
).to(device)
net_c = Net(
state_shape,
hidden_sizes=NETWORK_ARCHITECTURE,
activation=nn.Tanh,
device=device,
)
critic = Critic(net_c, device=device).to(device)
actor_critic = ActorCritic(actor, critic) I tested both DummyVectorEnv and SubprocVectorEnv. I found that the training setup takes significantly longer when using SubprocVectorEnv, similar to my experience with Tianshou version 0.5. However, the execution speed is very fast in Tianshou version 0.5, whether using DummyVectorEnv or SubprocVectorEnv. train_envs = DummyVectorEnv([make_train_env() for _ in range(NUM_TRAIN_ENVS)])
test_envs = DummyVectorEnv([make_test_env() for _ in range(NUM_TEST_ENVS)]) |
We will look into that asap, thanks for reporting! |
I did a quick speed test, comparing 0.5.0 to 1.0.0 and the current development version (1.2.0-dev). I tested with the atari_ppo example, using CPU, the Pong environment and a single env. While I did notice a slowdown, it is nowhere near the 12x slowdown you are describing; it is around 1.7x, which is still bad enough though. We will look into the reasons for the slowdown by profiling the current implementation, but it may not explain why your task is so much more greatly affected. Perhaps your environment causes the functions that are slower to be used more frequently, but it's hard to say. We will try to restore the speed of the old implementation for the Atari case and then you can check whether it helps for your use case as well, @ULudo. |
I'm in a similar situation. I used tianshou 0.5 one year ago (but not for very long time), saved some log, showing that most atari games's training speed is >=500 it/s. Recently, I've upgrade my video card to 3070Ti, upgraded to 1.2 but the training speed of PongNoFrameskip-v4 decreased all the way down from 80it/s to, like 5it/s. Interestingly, atari_ppo has a steady speed of about 200it/s, but atari_dqn, atari_sac and atari_rainbow are all extremely slow. I haven't spent to much time yet. If I have more findings, I'll let you know. |
Hello,
I used Tainshou 0.5 on a custom environment running on a Windows PC. I was impressed by the training speed of the PPO agent, which exceeded 2000 iterations per second.
0.5.0 0.26.3 2.5.1 1.26.4 3.11.10 | packaged by conda-forge | (main, Oct 16 2024, 01:17:14) [MSC v.1941 64 bit (AMD64)] win32
Training using the Tianshou library, version 0.5:
Recently, I upgraded to Tianshou 1.2, keeping the agent configuration the same. However, I observed a significant performance drop, with the new version running approximately 12 times slower, as shown below. I also tested that on Linux and observed the same results:
1.2.0-dev 0.28.1 2.1.1+cu121 1.24.4 3.11.10 (main, Sep 7 2024, 18:35:41) [GCC 11.4.0] linu
Training using the Tianshou library, version 1.2:
Have there been changes to the library that impact execution performance, and can I restore previous performance levels through configuration adjustments?
The text was updated successfully, but these errors were encountered: