Skip to content

Truth Data Format

Nicolò Gozzi edited this page Feb 20, 2024 · 15 revisions

The ground truth data for forecasting targets can be found in the folder target-data. To access the latest data file, use this link for ERVISS data and this link for FluID data. Alternatively, historical data files are stored in the snapshots folders and are named YYYY-MM-DD-ILI_incidence.csv, with YYYY-MM-DD representing the date of the last data update (which occurs every Friday). It's important to note that the latest file not only includes new data points but also the entire available history.

Each ground truth CSV file contains the following columns:

column column type description
location string ISO-2 code identifying the country
truth_date date Date in format YYYY-MM-DD: the last day of the truth week (Sunday)
year_week string A string denoting the year and week to which the truth data corresponds
value decimal ILI incidence per $100,000$

Below are illustrative rows as examples:

location,truth_date,year_week,value
AT,2023-11-26,2023-W47,2778.61155182025
AT,2023-11-19,2023-W46,2184.92925304376
AT,2023-11-12,2023-W45,2248.43487555352

From the first row, for instance, we can read that in Austria (AT), during week $47$ of the year $2023$, ending on Sunday, November 26, 2023, the reported ILI incidence per $100,000$ was approximately $2778.61$.

The countries are divided into the two data ground truth data sources as follows:

Data Source Countries (ISO-2 code)
ERVISS AT, BE, HR, CZ, DK, EE, FI, FR, GR, HU, IS, IE, IT, LV, LT, LU, MT, NL, NO, PL, PT, RO, SK, SI
FluID CH, GB-ENG, GB-WLS, GB-NIR, GB-SCT