Initial results #26
Replies: 3 comments 13 replies
-
Thanks so much for sharing! Some quick thoughts:
hehe, yeah, I know your pain! If it's any help: We decided to pre-prepare a set of training batches ahead of time. And then, during training, literally all the code has to do is load pre-prepared batches off disk. And, this is nice because hopefully you can fit your pre-prepared batches onto an SSD. Here's our code for pre-preparing batches, although I would guess that you'd be better off creating your own code from scratch, because our code is quite specialised to nowcasting PV power generation.
Yes, I definitely think it would be beneficial to implement a random leadtime on a per-sample-basis! For example, in Aribandi et al. 2021, the authors show that training an NLP model on multiple tasks works best if each batch contains a random sample of the tasks 🙂
If you're using a relatively small number of bins, it might be worth spacing the bins so, on average, there's a uniform probability of landing in any given bin. |
Beta Was this translation helpful? Give feedback.
-
Now I have trained on 8 GPUs for 48 hours, the training is still running but check out some of the results and my questions: w&b report Questions:
|
Beta Was this translation helpful? Give feedback.
-
I don't know if this is anything I should worry about, but I think it's worth pointing out. If we look at the fourth plot in the w&b report, it shows the number of bins in each respective class. We can see that the decimal precision of DBZ-data is not accurate enough fill every bin, that is, some bins are completely empty because of the way DBZ is transformed into mm/h. (proportional to 10**DBZ) Should I worry about this? My reasoning is that even if it's a dumb way to partition the data, it shouldn't affect the final result since all of those classes will just be mapped to 0. I have read the DGMR report, I assume you meant "Skilful precipitation nowcasting using deep generative models of radar". The random sampling technique is a bit complicated so I will start by trying balancing weights first. I will run the model again with the following changes:
Instead, now I sort with respect the number of bins representing a pixel in Y with rainfall. This way the model will not be biased to guess that rainfall decreases with time since it counts pixels in all 60 lead times. |
Beta Was this translation helpful? Give feedback.
-
Hello,
I have deployed the model on a larger chunk of my dataset now. Here is my model:
| Name | Type | Params
0 | image_encoder | TimeDistributed | 1.7 M
1 | ct | ConditionTime | 0
2 | temporal_enc | TemporalEncoder | 3.5 M
3 | position_embedding | AxialPositionalEmbedding | 14.3 K
4 | temporal_agg | Sequential | 4.2 M
5 | head | Conv2d | 7.7 K
9.4 M Trainable params
0 Non-trainable params
9.4 M Total params
37.676 Total estimated model params size (MB)
I am using default image encoder, 256 hidden layers in ConvGRU and 8 attention layers with 16 attention heads each.
I wanted to discuss a little bit of what to expect. First let me show you some overfits I did with only two training samples:
With only 1 leadtime the result is easily overfitted after 50 epochs:
With 5 lead times it's harder to overfit but it get's something done (300 epochs):
Now let's look at a run with the full network, find validation and training loss at w&b. This is a run of 400 epochs, during 4 hours on 8 GPUs in parallel. As you can see the network has not yet overfit since validation loss is not increasing. This run is done only with 280 training samples but I have a lot more data available, I am struggling implementing an efficient way to load all the data since it's so big (work in progress).
I wanted to discuss the following:
Here is a results of the network:
input:
y:
y_hat:
Beta Was this translation helpful? Give feedback.
All reactions