From 054a1b0d77240e5687408fd2de2cec877f86057d Mon Sep 17 00:00:00 2001 From: Alexandra Udaltsova <43303448+AUdaltsova@users.noreply.github.com> Date: Mon, 25 Nov 2024 17:08:30 +0000 Subject: [PATCH] Update README.md --- README.md | 63 ++++++++----------------------------------------------- 1 file changed, 9 insertions(+), 54 deletions(-) diff --git a/README.md b/README.md index 7303032..924416e 100644 --- a/README.md +++ b/README.md @@ -7,73 +7,29 @@ [![tags badge](https://img.shields.io/github/v/tag/openclimatefix/ocf-data-sampler?include_prereleases&sort=semver&color=FFAC5F)](https://github.com/openclimatefix/ocf-data-sampler/tags) [![ease of contribution: easy](https://img.shields.io/badge/ease%20of%20contribution:%20easy-32bd50)](https://github.com/openclimatefix#how-easy-is-it-to-get-involved) -**ocf-data-sampler** contains all the infrastructure needed to -create batches and feed them to our models, such as -[PVNet](https://github.com/openclimatefix/PVNet/). The data we work -with is usually too heavy to do this on the fly, so that's where this repo -comes in: handling steps like opening the data, selecting the right -samples, normalising and reshaping, and saving to and reading -from disk. - -We are currently migrating to this repo from [ocf_datapipes](https://github.com/openclimatefix/ocf_datapipes/), which -performs the same functions but is centered around `PyTorch DataPipes`, -which are quite cumbersome to work with and are no longer maintained by -PyTorch. **ocf-data-sampler** uses `PyTorch Datasets`, and we've -taken the opportunity to make the code much cleaner and more manageable. +**ocf-data-sampler** contains all the infrastructure needed to create batches and feed them to our models, such as [PVNet](https://github.com/openclimatefix/PVNet/). The data we work with is usually too heavy to do this on the fly, so that's where this repo comes in: handling steps like opening the data, selecting the right samples, normalising and reshaping, and saving to and reading from disk. + +We are currently migrating to this repo from [ocf_datapipes](https://github.com/openclimatefix/ocf_datapipes/), which performs the same functions but is built around `PyTorch DataPipes`, which are quite cumbersome to work with and are no longer maintained by PyTorch. **ocf-data-sampler** uses `PyTorch Datasets`, and we've taken the opportunity to make the code much cleaner and more manageable. > [!Note] > This repository is still in development and does not yet have the full > functionality of its predecessor, [ocf_datapipes](https://github.com/openclimatefix/ocf_datapipes/). -> It might not be ready for use out-of-the-box! We would really appreciate any help to let us make the transition faster. +> It might not be ready for use out of the box! We would really appreciate any help to let us make the transition faster. ## Documentation -**ocf-data-sampler** doesn't have expeternal documentation; -you can read a bit about how our torch datasets work in the -Readme [here](https://github.com/openclimatefix/ocf-data-sampler/tree/readme-update/ocf_data_sampler/torch_datasets). +**ocf-data-sampler** doesn't have external documentation; you can read a bit about how our torch datasets work in the Readme [here](https://github.com/openclimatefix/ocf-data-sampler/tree/readme-update/ocf_data_sampler/torch_datasets). ## FAQ -If you have any questions about this or any other of our repos, -don't hesitate to hop to our [Discussions Page](https://github.com/orgs/openclimatefix/discussions)! +If you have any questions about this or any other of our repos, don't hesitate to hop to our [Discussions Page](https://github.com/orgs/openclimatefix/discussions)! ### How does ocf-data-sampler deal with data sources that use different projections (e.g. some are in latitude-longitude, and some in OSGB)? -When creating samples, we make an areal crop of a -preset size centered around a -point-of-interest (POI, usually a solar or -wind farm). The size of the crop is set not in -miles or kilometres, but in 'pixels', which would -be different for different data sources, -depending on their spatial resolution, projections -they use, and where the POI is. For example, a -latitude-longitude source with a 1° -resolution will have pixel sizes corresponding to -very different 'surface' distances (that you might -measure in, e.g., kilometres) from a source with 0.1° -resolution. The pixel size will even be -different for the same source depending on how close -the POI is to the equator! - -Instead of trying to accommodate for all these -differences and make all the source use the same -spatial grid, we translate the POI's position -into the corresponding coordinate system and -select the crop using the source's original grid. -This 'snapshot' is then passed to the model with -no additional information on what specific -coordinates it represents; instead, since the -size is always the same and the POI is always -in the centre, the model gets consistent -information on the measurements at a location -near the POI and how it affects the target, -without any explicit knowledge on where -that location is in coordinate system terms. - - - +When creating samples, we make an areal crop of a preset size centred around a point of interest (POI, usually a solar or wind farm). The size of the crop is set not in miles or kilometres, but in 'pixels', which would be different for different data sources, depending on their spatial resolution, projections they use, and where the POI is. For example, a latitude-longitude source with a 1° resolution will have pixel sizes corresponding to very different 'surface' distances (that you might measure in, e.g., kilometres) from a source with 0.1° resolution. The pixel size will even be different for the same source depending on how close the POI is to the equator! +Instead of trying to accommodate for all these differences and make all the sources use the same spatial grid, we translate the POI's position into the corresponding coordinate system and select the crop using the source's original grid. This 'snapshot' is then passed to the model with no additional information on what specific coordinates it represents; instead, since the size is always the same and the POI is always in the centre, the model gets consistent information on the measurements at a location near the POI and how it affects the target, without any explicit knowledge of where that location is in coordinate system terms. ## Development @@ -85,8 +41,7 @@ pip install git+https://github.com/openclimatefix/ocf-data-sampler.git ### Running the test suite -The tests in this project use `pytest`. Once you have it installed, -you can run it from the project's directory: +The tests in this project use `pytest`. Once you have it installed, you can run it from the project's directory: ``` cd ocf-data-sampler