Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix performance regression in EspNet E2E with csj recipie #295

Open
take-cheeze opened this issue May 30, 2019 · 3 comments
Open

Fix performance regression in EspNet E2E with csj recipie #295

take-cheeze opened this issue May 30, 2019 · 3 comments

Comments

@take-cheeze
Copy link
Contributor

relates to: #289

@take-cheeze
Copy link
Contributor Author

GetXXXXX() could make some slow down

@shinh shinh changed the title Fix performance regression in new internal type of ChxVMVar Fix performance regression in EspNet E2E with csj recipie May 31, 2019
@shinh
Copy link
Member

shinh commented May 31, 2019

As I chatted locally, this regression was caused in this 3 months or so and #289 is not the culprit. So, let me highjack this issue to handle the perf regression.

Copy and paste of how to run the test:

$ PYTHONPATH=ch2o python3 ch2o/tests/model/EspNet_E2E.py --recipe csj_medium --gen csj_medium --gpu
$ ./build/tools/run_onnx --test csj_medium_backprop --backprop -d cuda -I 10 --fuse_operations --use_nvrtc
Average elapsed: 192.045 msec

The following command runs the same network by Chainer for reference:

$ PYTHONPATH=ch2o python3 ch2o/tests/model/EspNet_E2E.py --recipe csj_medium --run --gpu
Elapsed: 4776.919364929199 msec
Elapsed: 203.45425605773926 msec
Elapsed: 209.4278335571289 msec
Elapsed: 212.3851776123047 msec
Elapsed: 209.69533920288086 msec
Elapsed: 223.52290153503418 msec
Elapsed: 209.55896377563477 msec
Elapsed: 211.23909950256348 msec
Elapsed: 208.7113857269287 msec
Elapsed: 209.62882041931152 msec
Average elapsed: 210.84708637661404 msec

@shinh
Copy link
Member

shinh commented May 31, 2019

A screenshot of nvprof for a single iteration

It looks like OneHot is the bottleneck, but it might not. I it's just forcing GPU to sync due to AsScalar or something. Maybe returning Scalar type from IntScalarConstantOp and FloatScalarConstantOp and using Scalar as an argument of OneHot would remove the bottleneck. I'm not sure just removing the GPU sync helps, but it would be a good change anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants