Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training is slower without stress? #28

Open
VvVLzy opened this issue Sep 13, 2022 · 5 comments
Open

Training is slower without stress? #28

VvVLzy opened this issue Sep 13, 2022 · 5 comments

Comments

@VvVLzy
Copy link
Contributor

VvVLzy commented Sep 13, 2022

I have been using two datasets to train the model based on the pre-trained one. They are pretty similar in size, one without stress and the other with stress.

I notice that, using the same device configuration, the model trains much slower on the dataset without stress. It even runs out of memory after 2 epochs when using batch_size=32. I have to decrease the batch size to 16 to continue training.

The training speed for the dataset with stress is ~130ms/step with batch size of 32.
The training speed for the dataset with stress is ~270ms/step with batch size of 16.

I wonder what might be causing this factor of 4 slower in speed?

@chc273
Copy link
Contributor

chc273 commented Sep 15, 2022

Could you show a minimally reproducible script with some dummy data? @VvVLzy

@VvVLzy
Copy link
Contributor Author

VvVLzy commented Sep 16, 2022

Just to clarify: are you asking for the script I used for training as well as the two sets of training data?

@chc273
Copy link
Contributor

chc273 commented Sep 17, 2022

Yes, it would be helpful for checking where is the problem. It does not happen on my machines

@VvVLzy
Copy link
Contributor Author

VvVLzy commented Sep 20, 2022

Here are the scripts and dummy data for the slow and fast training (each in the corresponding folder). The slower training (without stress) uses monty to load/parse the data file (so the data file format is a bit different). However, the parsed data fed into the trainer is of the same format, so it should not affect training speed in that regard...

Both data files consist of 1000 training examples and 100 validation.

Thanks.

slow.zip
fast.zip

@kartiksau89
Copy link

VvVLzy

Dear Ziyao Luo,
Is it possible to share the code for generating the data for training from OUTCAR to JSON format? Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants