Add Marktstammdatenregister (MaStR) #165

lkstrp · 2024-06-10T14:28:07Z

Closes #16

Change proposed in this Pull Request

Adds Marktstammdatenregister via open-MaStR.

There are a few issues:

open-mastr provides a bulk download of all the cleaned datasets on zenodo. But as a .zip, so we have to download everything. We could use the API instead, but then the user has to pass a token.
These datasets are huge with many small power plants. I have now filtered out all plants with a capacity of less than 1 MW. Otherwise powerplant.aggregate_units() takes too long. Solar and wind are also currently not included.
- Performance can be improved I think, but the main bottleneck is probably Duke and not on ours side
Validation is not done yet, I wait for the ENTSOE token to run compare-with-entsoe-stats.py, but below is a first plot

File Name	Number of entrys	Entrys with less than 1 MW capacity
_biomass.csv	22284	21240 (95.32%)
_combustion.csv	85424	81776 (95.73%)
_nuclear.csv	6	0 (0.00%)
_hydro.csv	8657	7859 (90.78%)
_wind.csv	34798	6729 (19.34%)

Type of change

New feature (non-breaking change which adds functionality)

Checklist

I have added a note to release notes doc/release_notes.rst.
I have used pre-commit run --all to lint/format/check my contribution
I have documented the effects of my code changes in the documentation doc/.
I have adjusted the docstrings in the code appropriately.

FlorianK13 · 2024-07-31T09:28:08Z

Hi @lkstrp and other devs from powerplantmatching, I'm one of the developers of open-mastr. I like your work in harmonizing different sources for one european dataset. If there are issues from your side that are of concern for the open-mastr development, I'm happy to discuss them.

One remark on your comment above:
"We could use the API instead, but then the user has to pass a token."
This is not really a good idea. With the API you are limited to a small number of requests per day, so using it to get large data takes a long time. You could however run the bulk download to get an sqlite or postgres database and extract relevant information from there.

from open_mastr import Mastr

db = Mastr()
db.download()
# if you want csv files then also run
db.to_csv()

lkstrp · 2024-08-06T08:03:20Z

Hey @FlorianK13,
Thanks for reaching out!

So far the idea was to basically just use the zenodo download you provide, which is quite time consuming to download.

from open_mastr import Mastr

db = Mastr()
db.download()
# if you want csv files then also run
db.to_csv()

Does this approach have any advantages over the zenodo download? E.g. runs faster, allows downloading only selected data? The API reference reads like it downloads the same zip in bulk, but allows data selection. Which means it downloads everything and just strips away unselected data?

FlorianK13 · 2024-08-06T08:30:38Z

When using the python download method, you will get the most recent data (from the day before). On zenodo you will get the data from our last update, which is a few month old. However with zenodo your code is reproducible, as the python download changes every day as the dataset from BNetzA changes every day. To achieve reproducibilty with python, you would need to specify date="existing" (Reference) after you have downloaded the dataset once so that you use your existing local dataset from there on.

Both approaches take rather long, as you need to download the whole dataset. Afterwards you can specify which data you are interested to parse. So you are right with your last sentence 'Which means it downloads everything and just strips away unselected data.'

fneum · 2024-08-23T15:11:36Z

open-mastr provides a bulk download of all the cleaned datasets on zenodo. But as a .zip, so we have to download everything. We could use the API instead, but then the user has to pass a token.

Based on the discussion above, let's take the zenodo releases. If that's updated at least on an annual basis, that's fine. I am also not too worried about the large download size, as it is usually not a frequent action to update it and it's cached locally as well. @FlorianK13, it could be an option for upcoming releases to upload the individual CSV files unzipped into the zenodo repository, which would allow selective downloads (even though you lose the ZIP compression). This could be additional to the ZIP.

These datasets are huge with many small power plants. I have now filtered out all plants with a capacity of less than 1 MW. Otherwise powerplant.aggregate_units() takes too long. Solar and wind are also currently not included.

Yes, that's also what Global Energy Monitor does. Perhaps they will also integrate open-MaStR, then we wouldn't have to.

Validation is not done yet, I wait for the ENTSOE token to run compare-with-entsoe-stats.py, but below is a first plot

I got one on the same day I requested it today.

FlorianK13 · 2024-08-30T11:53:08Z

@fneum I created OpenEnergyPlatform/open-MaStR#558 to discuss if we can upload single files at zenodo.

lkstrp added 2 commits June 10, 2024 09:46

fix: resolve auto downcasting warning

5284022

feat: add MaStR data

826f444

lkstrp requested a review from FabianHofmann July 10, 2024 15:04

fneum mentioned this pull request Jul 25, 2024

update BNetzA Kraftwerksliste to 2022 version #75

Closed

8 tasks

FlorianK13 mentioned this pull request Aug 30, 2024

Upload single csv files to zenodo OpenEnergyPlatform/open-MaStR#558

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Marktstammdatenregister (MaStR) #165

Add Marktstammdatenregister (MaStR) #165

lkstrp commented Jun 10, 2024

FlorianK13 commented Jul 31, 2024

lkstrp commented Aug 6, 2024

FlorianK13 commented Aug 6, 2024

fneum commented Aug 23, 2024

FlorianK13 commented Aug 30, 2024

Add Marktstammdatenregister (MaStR) #165

Are you sure you want to change the base?

Add Marktstammdatenregister (MaStR) #165

Conversation

lkstrp commented Jun 10, 2024

Change proposed in this Pull Request

Type of change

Checklist

FlorianK13 commented Jul 31, 2024

lkstrp commented Aug 6, 2024

FlorianK13 commented Aug 6, 2024

fneum commented Aug 23, 2024

FlorianK13 commented Aug 30, 2024