Site torch dataset update #82

Sukh-P · 2024-11-19T17:40:24Z

Pull Request

Description

WIP PR to update the Site Torch Dataset to return samples as xarray Datasets for easier conversion into netcdf files which is the preferred current format of saving samples.

This PR includes:

Adding a new functions to process the sample dict (dict with xr DataArrays) into one Dataset
Reordering of when .compute() is called since now we combine multiple DataArrays into a Dataset we can call compute after this is done
Removed unused site specific parts from original process and combine function
Updating unit tests now that the data type of the sample has changes in the Torch Dataset
Updating some time interval syntax to stop a deprecation warning (unrelated to the above changes)

TODO

Removed saving solar coordinates data in samples for now, current idea is to use the numpy batch functions in here in PVNet to create this data when converting to a numpy batch (if this seems messy may add some logic in here to add to the solar position coordinates to the Dataset)
Add new functions to go from a Dataset to NumpyBatch/TensorBatch
Check this works by creating some samples and adding logic into PVNet to read the netcdfs and convert to NumpyBatch/TensorBatches and then train a model, will link PR here once that is done

peterdudfield · 2024-11-19T18:11:48Z

Thanks @Sukh-P great to push this forward.

A few quick thoughts, and sorry if these seem obvious and have already been answered

is the ideal still to convert the site batch dataset to a dict of tensors ready for the model (PVnet)? if so, will this code sit in here, or in PVNet
For different torch dataloaders, do we have an idea of how to do this for the three different process? 1. make batches, 2. load batches and train model, 3. running inference. It would be a shame to have separate torch datasets for each loader, but perhaps there is a simple way to do this. This is very much in your TODO section, so perhaps you have thought about this already / going to next.
This might be related to 2., but do you know where the combining samples to batch process fits in?

Sukh-P · 2024-11-20T10:39:49Z

@peterdudfield thanks, I have tried to answer these:

is the ideal still to convert the site batch dataset to a dict of tensors ready for the model (PVnet)? if so, will this code sit in here, or in PVNet

Yes that's still the plan, perhaps still making a NumpyBatch if we want to have a more generic intermediary format, and yes the code will probably be added to here but then called in PVNet, can make that clearer in the TODO list above

For different torch dataloaders, do we have an idea of how to do this for the three different process? 1. make batches, 2. load batches and train model, 3. running inference. It would be a shame to have separate torch datasets for each loader, but perhaps there is a simple way to do this. This is very much in your TODO section, so perhaps you have thought about this already / going to next.

I think the longer term plan is to move towards one batch format (netcdf) and have a common interface with batches through a batch object, in this way things will be more generalised and we will have less of having a different way to do things each time, I imagine this will need a bit more thought and can be improved after adding in a working pipeline for sites, can create an issue/discussion around this after we have

This might be related to 2., but do you know where the combining samples to batch process fits in?

So I think this is managed by having a function which does some stacking of samples like here into a batch and the Torch DataLoader where you specify how many samples would be in a batch

Sukhil Patel added 3 commits November 18, 2024 16:50

Update what Site Dataset returns

373fca7

Improve site dataset unit test

6c3d604

Remove unused logic

af647ff

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Site torch dataset update #82

Site torch dataset update #82

Sukh-P commented Nov 19, 2024 •

edited

Loading

peterdudfield commented Nov 19, 2024

Sukh-P commented Nov 20, 2024

Site torch dataset update #82

Are you sure you want to change the base?

Site torch dataset update #82

Conversation

Sukh-P commented Nov 19, 2024 • edited Loading

Pull Request

Description

peterdudfield commented Nov 19, 2024

Sukh-P commented Nov 20, 2024

Sukh-P commented Nov 19, 2024 •

edited

Loading