This repository also provides tools to fetch, process, and insert various datasets into a PostgreSQL database. These tools are particularly useful for managing German administrative and energy-related data, such as municipality keys, energy unit metadata, and administrative areas.
This tool fetches and inserts official German municipality keys into a PostgreSQL database.
- Database Setup
- Ensure PostgreSQL is installed and running on
localhost
(default port:5432
). - Create a database named
oklab
, owned by a user with the same name. - Make sure the database accepts connections from
localhost
.
- Environment Variables
- Create a
.env
file in the root directory of this repository and add the following environment variables with your specific values:
DB_PASS=YOUR_PASSWORD_HERE
DB_HOST=localhost
DB_USER=oklab
DB_NAME=oklab
DB_PORT=5432
- Python
- Python 3 installed with
venv
andpip
available.
- Set up the database schema:
psql -U oklab -h localhost -d oklab -p 5432 < data/de_municipality_keys_schema.sql
- Activate a Python virtual environment and install dependencies:
cd tools
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
- Run the script to insert municipality keys:
python3 insert_municipality_key.py \
--env ../.env \
--target ../data \
--url https://www.xrepository.de/api/xrepository/urn:de:bund:destatis:bevoelkerungsstatistik:schluessel:ags_2024-10-31/download/AGS_2024-10-31.json \
--verbose
- Deactivate the virtual environment:
deactivate
This tool fetches and inserts wind turbine and solar energy metadata into your PostgreSQL database.
Ensure the database and Python environment are set up as described in the Insert Municipality Keys section.
- Set up the database schema:
psql -U oklab -h localhost -d oklab -p 5432 < data/de_energy_meta_schema.sql
- Activate a Python virtual environment and install dependencies:
cd tools
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
- Run the script to insert energy metadata:
python3 insert_energy_meta.py \
--env ../.env \
--target ~ \
--url https://www.marktstammdatenregister.de/MaStR/Einheit/EinheitJson/GetFilterColumnsErweiterteOeffentlicheEinheitStromerzeugung \
--verbose
- Deactivate the virtual environment:
deactivate
This tool processes data from the German energy market data register to insert various energy unit types (e.g., wind, solar, nuclear) into your PostgreSQL database.
-
Download the complete public data extract from the German Energy Market Data Register and unpack the zip archive.
-
Set up the database schema for each energy unit type:
psql -U oklab -h localhost -d oklab -p 5432 < data/de_biomass_units_schema.sql
psql -U oklab -h localhost -d oklab -p 5432 < data/de_combustion_units_schema.sql
psql -U oklab -h localhost -d oklab -p 5432 < data/de_nuclear_units_schema.sql
psql -U oklab -h localhost -d oklab -p 5432 < data/de_solar_units_schema.sql
psql -U oklab -h localhost -d oklab -p 5432 < data/de_water_units_schema.sql
psql -U oklab -h localhost -d oklab -p 5432 < data/de_wind_units_schema.sql
For each energy unit type, follow these steps:
- Activate a Python virtual environment and install dependencies:
cd tools
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
- Run the script for the respective energy unit type. For example:
python3 insert_biomass_units.py --env ../.env --src ~/EinheitenBiomasse.xml --verbose
python3 insert_combustion_units.py --env ../.env --src ~/EinheitenVerbrennung.xml --verbose
python3 insert_nuclear_units.py --env ../.env --src ~/EinheitenKernkraft.xml --verbose
python3 insert_solar_units.py --env ../.env --src ~/EinheitenSolar_1.xml --verbose
python3 insert_water_units.py --env ../.env --src ~/EinheitenWasser.xml --verbose
python3 insert_wind_units.py --env ../.env --src ~/EinheitenWind.xml --verbose
- Deactivate the virtual environment:
deactivate
Download administrative area data with population figures from the Federal Agency for Cartography and Geodesy. Use the ogr2ogr
tool to insert this data into your PostgreSQL database.
- Download and unpack the administrative area dataset:
wget https://daten.gdz.bkg.bund.de/produkte/vg/vg250-ew_ebenen_1231/aktuell/vg250-ew_12-31.utm32s.gpkg.ebenen.zip
unzip vg250-ew_12-31.utm32s.gpkg.ebenen.zip
cd vg250-ew_12-31.utm32s.gpkg.ebenen/vg250-ew_ebenen_1231
- Insert data into your database using
ogr2ogr
:
ogr2ogr -f "PostgreSQL" PG:"dbname=oklab user=oklab port=5432 host=localhost" \
-lco GEOMETRY_NAME=geom -lco SPATIAL_INDEX=GIST -lco PRECISION=NO \
-t_srs EPSG:4326 -nlt MULTIPOLYGON -overwrite -update DE_VG250.gpkg vg250_gem
ogr2ogr -f "PostgreSQL" PG:"dbname=oklab user=oklab port=5432 host=localhost" \
-lco GEOMETRY_NAME=geom -lco SPATIAL_INDEX=GIST -lco PRECISION=NO \
-t_srs EPSG:4326 -nlt MULTIPOLYGON -overwrite -update DE_VG250.gpkg vg250_krs
ogr2ogr -f "PostgreSQL" PG:"dbname=oklab user=oklab port=5432 host=localhost" \
-lco GEOMETRY_NAME=geom -lco SPATIAL_INDEX=GIST -lco PRECISION=NO \
-t_srs EPSG:4326 -nlt MULTIPOLYGON -overwrite -update DE_VG250.gpkg vg250_lan
ogr2ogr -f "PostgreSQL" PG:"dbname=oklab user=oklab port=5432 host=localhost" \
-lco GEOMETRY_NAME=geom -lco SPATIAL_INDEX=GIST -lco PRECISION=NO \
-t_srs EPSG:4326 -nlt MULTIPOLYGON -overwrite -update DE_VG250.gpkg vg250_rbz
ogr2ogr -f "PostgreSQL" PG:"dbname=oklab user=oklab port=5432 host=localhost" \
-lco GEOMETRY_NAME=geom -lco SPATIAL_INDEX=GIST -lco PRECISION=NO \
-t_srs EPSG:4326 -nlt MULTIPOLYGON -overwrite -update DE_VG250.gpkg vg250_sta
ogr2ogr -f "PostgreSQL" PG:"dbname=oklab user=oklab port=5432 host=localhost" \
-lco GEOMETRY_NAME=geom -lco SPATIAL_INDEX=GIST -lco PRECISION=NO \
-t_srs EPSG:4326 -nlt MULTIPOLYGON -overwrite -update DE_VG250.gpkg vg250_vwg
ogr2ogr -f "PostgreSQL" PG:"dbname=oklab user=oklab port=5432 host=localhost" \
-lco GEOMETRY_NAME=geom -lco SPATIAL_INDEX=GIST -lco PRECISION=NO \
-t_srs EPSG:4326 -nlt LINESTRING -overwrite -update DE_VG250.gpkg vg250_li
ogr2ogr -f "PostgreSQL" PG:"dbname=oklab user=oklab port=5432 host=localhost" \
-lco GEOMETRY_NAME=geom -lco SPATIAL_INDEX=GIST -lco PRECISION=NO \
-t_srs EPSG:4326 -nlt POINT -overwrite -update DE_VG250.gpkg vg250_pk
- Next it is recommanded to create indecies using
psql
:
-- index state, county and municipality ids
CREATE INDEX IF NOT EXISTS idx_lan_sn_l ON vg250_lan (sn_l);
CREATE INDEX IF NOT EXISTS idx_krs_sn_l ON vg250_krs (sn_l);
CREATE INDEX IF NOT EXISTS idx_krs_sn_r ON vg250_krs (sn_r);
CREATE INDEX IF NOT EXISTS idx_krs_sn_k ON vg250_krs (sn_k);
CREATE INDEX IF NOT EXISTS idx_gem_sn_k ON vg250_gem (sn_k);
CREATE INDEX IF NOT EXISTS idx_gem_sn_r ON vg250_gem (sn_r);
CREATE INDEX IF NOT EXISTS idx_gem_sn_l ON vg250_gem (sn_l);
-- index for the municipality key
CREATE INDEX IF NOT EXISTS idx_vg250_gem_ags ON vg250_gem (ags);
-- index state names
CREATE INDEX IF NOT EXISTS idx_vg250_lan_gen ON vg250_lan (gen);
-- index on geofactor
CREATE INDEX IF NOT EXISTS idx_vg250_gem_gf ON vg250_gem (gf);
CREATE INDEX IF NOT EXISTS idx_vg250_krs_gf ON vg250_krs (gf);
CREATE INDEX IF NOT EXISTS idx_vg250_lan_gf ON vg250_lan (gf);
-- index place and admin level
CREATE INDEX IF NOT EXISTS idx_osm_point_place ON planet_osm_point (place);
CREATE INDEX IF NOT EXISTS idx_osm_polygon_admin_level ON planet_osm_polygon (admin_level);
- Create a materialized view for faster search
Note that materialized views need to be refreshed to update their content if the underlying data changes. After restoring, you may need to run:
REFRESH MATERIALIZED VIEW your_materialized_view_name;
In case you want to drop an existing materialized view run:
DROP MATERIALIZED VIEW IF EXISTS mv_de_geographical_regions CASCADE;
To create the materialized view run:
CREATE MATERIALIZED VIEW IF NOT EXISTS mv_de_geographical_regions AS
SELECT DISTINCT ON (point.name, gem.ags)
REPLACE(point.name, 'Sankt ', 'St. ') AS geographical_name,
CASE
WHEN point.name != gem.gen AND point.place IN ('suburb', 'village', 'isolated_dwelling', 'hamlet', 'island', 'neighbourhood', 'quarter') THEN gem.bez || ' ' || REPLACE(gem.gen, 'Sankt ', 'St. ')
WHEN point.name = gem.gen AND point.place IN ('suburb', 'village', 'isolated_dwelling', 'hamlet', 'island', 'neighbourhood', 'quarter') THEN krs.bez || ' ' || krs.gen
WHEN point.place IN ('municipality', 'city', 'town') AND gem.ibz != 60 AND gem.ibz != 61 THEN krs.bez || ' ' || krs.gen
ELSE lan.gen
END AS region_name,
gem.ags AS municipality_key
FROM
planet_osm_point AS point
JOIN
planet_osm_polygon AS poly
ON
ST_Contains(poly.way, point.way)
JOIN
vg250_gem AS gem
ON
ST_Contains(gem.geom, ST_Transform(point.way, 4326)) AND gem.gf = 4
JOIN
vg250_krs AS krs
ON
gem.sn_l = krs.sn_l AND gem.sn_r = krs.sn_r AND gem.sn_k = krs.sn_k AND krs.gf = 4
JOIN
vg250_lan AS lan
ON
krs.sn_l = lan.sn_l
WHERE
point.place IN ('municipality', 'suburb', 'city', 'town', 'village', 'isolated_dwelling', 'hamlet', 'island', 'neighbourhood', 'quarter')
AND point.name IS NOT NULL
AND poly.admin_level = '6';
- Create indecies for this view
The indexing only need to be done if you want to work on the materialized view. Make sure to create extionsion before creating indecies.
CREATE EXTENSION IF NOT EXISTS pg_trgm;
-- index for comparison based on column
CREATE INDEX IF NOT EXISTS idx_mv_geographical_name ON mv_de_geographical_regions (geographical_name);
CREATE INDEX IF NOT EXISTS idx_mv_municipality_key ON mv_de_geographical_regions (municipality_key);
CREATE INDEX IF NOT EXISTS idx_mv_region_name ON mv_de_geographical_regions (region_name);
-- index for search based on ngram
CREATE INDEX IF NOT EXISTS idx_gin_mv_gr_geographical_name_lower ON mv_de_geographical_regions USING gin (LOWER(geographical_name) gin_trgm_ops);
CREATE INDEX IF NOT EXISTS idx_gin_mv_gr_region_name_lower ON mv_de_geographical_regions USING gin (LOWER(region_name) gin_trgm_ops);
- To dump the materialized view
psql -U oklab -h localhost -d oklab -p 5432 -c "DROP TABLE IF EXISTS de_geographical_regions"
psql -U oklab -h localhost -d oklab -p 5432 -c "CREATE TABLE IF NOT EXISTS de_geographical_regions AS TABLE mv_de_geographical_regions"
pg_dump -U oklab -d oklab -t de_geographical_regions --inserts > ~/de_geographical_regions.sql
psql -U oklab -h localhost -d oklab -p 5432 -c "DROP TABLE IF EXISTS de_geographical_regions"
- To restore the dumped materialized view
psql -U oklab -h localhost -d oklab -p 5432 -c "DROP TABLE IF EXISTS de_geographical_regions"
psql -U oklab -h localhost -d oklab -p 5432 -f ~/de_geographical_regions.sql
psql -U oklab -h localhost -d oklab -p 5432 -c "CREATE INDEX IF NOT EXISTS idx_gin_geographical_name_lower ON de_geographical_regions USING gin (LOWER(geographical_name) gin_trgm_ops);"
psql -U oklab -h localhost -d oklab -p 5432 -c "CREATE INDEX IF NOT EXISTS idx_gin_region_name_lower ON de_geographical_regions USING gin (LOWER(region_name) gin_trgm_ops);"
This tool downloads and processes data about German weather stations and inserts it into a PostgreSQL database.
-
Environment Variables: Ensure that you have created the
.env
file as described in previous sections. -
Python: Install Python 3 with
venv
andpip
, if not already done. -
Wget: Install
wget
to download the source file.
- Download Weather Station Data:
Use the following command to download the list of German weather stations:
wget https://opendata.dwd.de/climate_environment/CDC/observations_germany/climate/daily/kl/historical/KL_Tageswerte_Beschreibung_Stationen.txt
This will download a file named KL_Tageswerte_Beschreibung_Stationen.txt
to your current working directory.
- Activate Virtual Environment and Install Dependencies:
If you haven’t done so already, activate your Python virtual environment and install the required dependencies:
cd tools
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
- Insert Data into PostgreSQL Database:
Run the following command to insert the weather station data into the database:
python3 insert_weather_stations.py --env ../.env --src ~/KL_Tageswerte_Beschreibung_Stationen.txt --verbose
Parameters:
--env ../.env
: Path to the environment variable file.--src ~/KL_Tageswerte_Beschreibung_Stationen.txt
: Path to the downloaded file.--verbose
: Optional flag to enable detailed logging output.
- Deactivate Virtual Environment:
Deactivate the Python environment when you're finished:
deactivate
This tool fetches and inserts EU country code data into your PostgreSQL database.
- Database Setup
- Ensure PostgreSQL is installed and running on
localhost
(default port:5432
). - Create a database named
oklab
, owned by a user with the same name. - Ensure the database is accessible from
localhost
.
- Environment Variables
- Create a
.env
file in the root directory of this repository and add the following environment variables with your specific values:
DB_PASS=YOUR_PASSWORD_HERE
DB_HOST=localhost
DB_USER=oklab
DB_NAME=oklab
DB_PORT=5432
- Python
- Ensure Python 3 is installed with
venv
andpip
available.
- Set up the database schema:
Use the following command to create the necessary schema for EU country codes in the database:
psql -U oklab -h localhost -d oklab -p 5432 < data/eu_country_codes_schema.sql
This will execute the eu_country_codes_schema.sql
script to create the necessary tables and schema in the PostgreSQL database.
- Set up a Python virtual environment and install dependencies:
Navigate to the tools
directory and create a virtual environment. Then, activate it and install the required Python dependencies:
cd tools
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
- Insert EU country codes into the database:
Run the script to insert the country codes from the eu_country_codes.csv
file into the PostgreSQL database:
python3 insert_country_codes.py --env ../.env --src ../data/eu_country_codes.csv --verbose
Parameters:
--env ../.env
: Path to the environment variable file.--src ../data/eu_country_codes.csv
: Path to the CSV file containing the country codes.--verbose
: Optional flag to enable detailed logging output.
- Deactivate the virtual environment:
Once the script has run successfully, deactivate the Python virtual environment:
deactivate
This section describes how to set up and insert weather station reference data into your PostgreSQL database.
- Database Schema Setup:
Ensure that the PostgreSQL database schema for weather station reference is set up. Use the following command:
psql -U oklab -h localhost -d oklab -p 5432 < data/dwd_station_reference_schema.sql
- Environment Variables:
Create a .env
file in the root directory with the following content:
DB_PASS=YOUR_PASSWORD_HERE
DB_HOST=localhost
DB_USER=oklab
DB_NAME=oklab
DB_PORT=5432
- Python Setup:
Ensure Python 3 is installed with venv
and pip
.
- Set Up the Python Environment:
Navigate to the tools
directory and set up a Python virtual environment:
cd tools
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
- Insert Weather Station Reference:
Run the following command to insert weather station reference data from the provided URL into your database:
Note you may find the lates url at DWD Stationslexikon under the section
Stationslexikon im HTML-Format
.
python3 insert_weather_reference.py --env ../.env --url https://www.dwd.de/DE/leistungen/klimadatendeutschland/statliste/statlex_html.html\?view\=nasPublication\&nn\=16102 --verbose
Parameters:
--env ../.env
: Path to the.env
file with database credentials.--url
: URL to the data source.--verbose
: Optional flag for detailed logging output.
- Deactivate the Python Environment:
Once the script has completed, deactivate the Python environment:
deactivate