Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Challenge: new model #30

Open
peterdudfield opened this issue Dec 20, 2023 · 17 comments · Fixed by #76
Open

Challenge: new model #30

peterdudfield opened this issue Dec 20, 2023 · 17 comments · Fixed by #76
Labels
good first issue Good for newcomers help wanted Extra attention is needed

Comments

@peterdudfield
Copy link
Contributor

peterdudfield commented Dec 20, 2023

Can you make a new model and beat the current evaluations metrics?

You need to build a forecasts to forecast PV. The PV dataset is all here, and we also want to model to run like the current model i.e pulling NWP data from open-meteo.

We need a model that can forecast 48 hours ahead, in 15 minute intervals. We want it to run live without PV live data, but an good optional extra would be to include PV data.

This is fairly open ended on in order to not restrict anyone.

@peterdudfield peterdudfield added the help wanted Extra attention is needed label Feb 26, 2024
@peterdudfield
Copy link
Contributor Author

I good think to do firstly, would be to build a general pipeline that takes weather data joins pv data tgoether.
It might be a case of writing this fresh, or using ocf_datapipes

@shreyasudaya
Copy link
Contributor

Sorry to comment here, but I would like to ask about whether this paper is relevant to the related issue.
https://www.sciencedirect.com/science/article/pii/S0960148123009035#tbl1

1 similar comment
@shreyasudaya
Copy link
Contributor

Sorry to comment here, but I would like to ask about whether this paper is relevant to the related issue.
https://www.sciencedirect.com/science/article/pii/S0960148123009035#tbl1

@roshnaeem
Copy link
Contributor

Hello @peterdudfield , i would like to work on this issue. Can you please assign me?

@peterdudfield
Copy link
Contributor Author

Hi @roshnaeem
If its ok, I'll keep the assignees so that it enourages lots of people to tackle this issue. Is that ok?
Thank you so much on working on this, please write here, if you have any questions

@peterdudfield
Copy link
Contributor Author

Some general questions:

  1. Where can i read about psp library which is being used in the project ?

https://github.com/openclimatefix/pv-site-prediction, but I wouldnt get too stuck into this code. I think it would be better to write something freiends

  1. Through readme and code, i could understand that we are using Gradient boosted trees model and it is being called to make predictions through run_forecast function. Can i see the code for the model?

See above

  1. For the next models, which we would be adding to the project, should the parameters for prediction be same i.e, PVSite(latitude, longitude, capacity_kwp) and timestamp as the current model?

Yea, but also the NWPs are going to be very important

  1. How can i check the accuracy and other bench marks for the current model being used?

use the evaulation script - https://github.com/openclimatefix/Open-Source-Quartz-Solar-Forecast/blob/main/scripts/run_evaluation.py

@roshnaeem
Copy link
Contributor

Sure @peterdudfield, thank you, i am checking the code, and will open a PR soon.

@roshnaeem
Copy link
Contributor

Thank you @peterdudfield for your guidance. I went through pv-site-prediction and ocf-datapipes repositories to understand the basics. I have a couple of questions.

  1. The current model is using the combination of NWP data and PV site data to train the model, right? For the next model, what does "We want it to run live without PV live data" mean?
  2. ocf-datapipes is integrating both types of data, right? Are we using it to provide training data?
  3. Regarding your comment for the first good step, can you explain to me which part of the data preprocessing I should work on, that can be a good PR for the GSOC proposal? I see there are two approaches mentioned. Can you please tell me the step-by-step approach to handle this subtask?

@peterdudfield
Copy link
Contributor Author

peterdudfield commented Mar 5, 2024

  1. No live PV data means the model can run inference with only NWP data. This is what we have found lots of people want.

  2. If you want to use it yes, currerntly its not being used in the repo

  3. I'm not sure what you mean by two approaches? Could you clarify? I'm not sure i can tell you a step by step approach, but I can try to outline things

@roshnaeem
Copy link
Contributor

I good think to do firstly, would be to build a general pipeline that takes weather data joins pv data tgoether. It might be a case of writing this fresh, or using ocf_datapipes

@peterdudfield I was talking about these two approaches you mentioned in this comment.

@peterdudfield
Copy link
Contributor Author

I good think to do firstly, would be to build a general pipeline that takes weather data joins pv data tgoether. It might be a case of writing this fresh, or using ocf_datapipes

@peterdudfield I was talking about these two approaches you mentioned in this comment.

I'd probably try ocf_datapipes first, and if it doesnt suit, then try to write something fresh

@roshnaeem
Copy link
Contributor

@peterdudfield, I have a few questions regarding the GSOC proposal.

  1. Would we be using ocf_datapipes as well as building new datapipes for the new model?
  2. Should the current model also work on these data_pipes?
  3. If we run the inference only with NWP data, would we be using standard capacity for PV systems?

@peterdudfield
Copy link
Contributor Author

I would leave the current model how it is, but aim to use ocf_datapipes for the new model.

  1. Capacity is a useful feature as you can have the same NWP conditions you can have different PV power depending on the capacity

@peterdudfield
Copy link
Contributor Author

**** Question ******
How can we use only nwp data to predict, we would need capacity and pv site data to get the nwp data. Does live PV data means that we would be getting pv data in real time and predicting the generation in real time?


yea, it would be good to use pv metadata data, like capacity and nwp data in the model. The live PV data would also increase the accuracy of the model, but we've tried in this repo to have that as optional. So first of all the model works with NWP and PV metadata

@felipewhitaker
Copy link

felipewhitaker commented Mar 13, 2024

Hi,

I am working on #27 and this discussion helped a lot, thanks!

I explored the project and ended up in psp, since it contains the code to train the models. I ran its train and eval model after setting up the environment, but wasn't able to use its result (.pkl) directly as a model (substituting the current default model in forecast_v1 by psp's test_config1 model .pkl). Should using psp's model directly be possible?

@BayoHabib
Copy link

Hi @peterdudfield I would like to work on this issue. I'll be available from march 28.

@Fofoabdo
Copy link

can i work on this issue ?

@peterdudfield peterdudfield added the good first issue Good for newcomers label Jun 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue Good for newcomers help wanted Extra attention is needed
Projects
None yet
6 participants