Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add weekly hash check to NUTS file and XLSX to YAML utility function #40

Merged
merged 4 commits into from
Dec 11, 2024
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions .github/workflows/check_hash.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
name: Compare File Hash Weekly

on:
schedule:
- cron: '0 0 * * 0' # Runs weekly on Sunday at midnight UTC

jobs:
compare-hash:
runs-on: ubuntu-latest
steps:
- name: Checkout Repository
uses: actions/checkout@v3

- name: Calculate Local File Hash
id: local-hash
run: echo "LOCAL_HASH=$(sha256sum pysquirrel/data/NUTS2021-NUTS2024.xlsx | awk '{print $1}')" >> $GITHUB_ENV

- name: Download File
run: curl -o most-recent-version.xlsx "https://ec.europa.eu/eurostat/documents/345175/629341/NUTS2021-NUTS2024.xlsx"
phackstock marked this conversation as resolved.
Show resolved Hide resolved

- name: Calculate Downloaded File Hash
id: downloaded-hash
run: echo "DOWNLOADED_HASH=$(sha256sum most-recent-version.xlsx | awk '{print $1}')" >> $GITHUB_ENV

- name: Compare Hashes
run: |
if [ "$LOCAL_HASH" != "$DOWNLOADED_HASH" ]; then
echo "Hashes do not match!"
exit 1
else
echo "Hashes match!"
fi
26 changes: 24 additions & 2 deletions pysquirrel/core.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,18 @@

import os
import yaml
from openpyxl import load_workbook
from pydantic.dataclasses import dataclass

# Base path for package code
BASE_PATH = Path(__file__).absolute().parent
DATA_PATH = BASE_PATH / "data"
COL_NAME_ROW = 1
MIN_DATA_ROW = 2
MAX_DATA_COL = 4


# utility function
# Utility functions
def flatten(lst):
for i in lst:
if isinstance(i, list):
Expand All @@ -23,6 +25,26 @@ def flatten(lst):
yield i


def nuts_to_yaml(path: str, output_dir: str):
"""Converts a NUTS .xlsx source file to YAML files."""

workbook = load_workbook(path, read_only=True, data_only=True)

for sheet, file in {
"NUTS2024": "NUTS2021-2024.yaml",
"Statistical Regions": "SR2021-2024.yaml",
}.items():
regions = []
worksheet = workbook[sheet]
cols = [cell.value for cell in worksheet[1]]
for row in worksheet.iter_rows(min_row=MIN_DATA_ROW, max_col=MAX_DATA_COL):
if all(cell.value for cell in row):
regions.append({col: cell.value for (col, cell) in zip(cols, row)})

with open(Path(output_dir) / file, "w") as f:
yaml.dump(regions, f, allow_unicode=True)


class Level(IntEnum):
LEVEL_1 = 1
LEVEL_2 = 2
Expand Down Expand Up @@ -116,7 +138,7 @@ def _load(self) -> None:

for data_file in os.listdir(DATA_PATH):
for region_type, cls in region_class.items():
if data_file.startswith(region_type):
if data_file.startswith(region_type) and data_file.endswith("yaml"):
with open(DATA_PATH / data_file, "r", encoding="utf8") as f:
data = yaml.safe_load(f)
for region in data:
Expand Down
Binary file added pysquirrel/data/NUTS2021-NUTS2024.xlsx
Binary file not shown.