Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save batches script saves examples rather than number of batches for sites #174

Open
Sukh-P opened this issue Apr 11, 2024 · 2 comments
Open
Labels
bug Something isn't working

Comments

@Sukh-P
Copy link
Member

Sukh-P commented Apr 11, 2024

Describe the bug

Currently the batch_size is not taken into account when creating batches for sites in the save_batches script, resulting in one example in each netcdf file so this has to be accounted for when creating number of batches (it will actually be number of examples so if batch size is 8 need to create x8 number of examples)

To note this also affects batch creation for WindNet.

To Reproduce

Run the save_batches script but with the renewable variable in the config set to a value such as "pv _india" that uses the pvnet_site_datapipe function

Expected behavior

For each netcdf file ouptutted to train and val folders to have batch_size number of examples in it, rather than one example per netcdf filfe.

Additional context

This can be changed in this repo or by editing the pvnet_site.py file in the pvnet_site_datapipe function in the ocf_datapipes repo to take into batch_size as a parameter. However there would need to be testing that loading batches for model training through this function does not break given multiple examples now in each batch.

@Sukh-P Sukh-P added the bug Something isn't working label Apr 11, 2024
@peterdudfield
Copy link
Contributor

Was the conclusiong of this, that its easier to save examples, rather than batches, becasue th eexamples are .netcdf files and batches are numpy file.
Do we need to just renmae things from batches to examples to make this clear?

@Sukh-P
Copy link
Member Author

Sukh-P commented May 9, 2024

So I don't think this is urgent but I wanted there to be some visibility of this deviation from previous behaviour, it only impacts batch creation/number of examples rather than since batches are recreated from samples using batch size correctly and if you're aware that it's number of samples rather than number of batches it's fine. Renaming would be a little tricky because this is just the site/India PVNet side but we could just add some comments somewhere to make this clearer.

But IMO to keep things consistent with UK PVNet we should make the changes so that batches created are saved with the number of examples specified in the batch size, just will take a bit of work to make sure that doesn't break the conversion into numpy batches that happens when training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants