From 455662e92095cbfaecfda37d724b5c3701ba27a6 Mon Sep 17 00:00:00 2001 From: Edward Byers Date: Fri, 13 Jan 2023 17:01:12 +0100 Subject: [PATCH 1/2] Add simple tutorial to demonstrate -- read codelist, export to excel, edit, then read and write back to yaml. --- ...rial_edit_existing_variable_codelist.ipynb | 210 ++++++++++++++++++ 1 file changed, 210 insertions(+) create mode 100644 doc/source/user_guide/tutorial_edit_existing_variable_codelist.ipynb diff --git a/doc/source/user_guide/tutorial_edit_existing_variable_codelist.ipynb b/doc/source/user_guide/tutorial_edit_existing_variable_codelist.ipynb new file mode 100644 index 00000000..d0eb6c64 --- /dev/null +++ b/doc/source/user_guide/tutorial_edit_existing_variable_codelist.ipynb @@ -0,0 +1,210 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "id": "15ad0472", + "metadata": {}, + "source": [ + "# Tutorial to read variable definition codelist, apply changes in excel and write out again to yaml" + ] + }, + { + "cell_type": "markdown", + "id": "0fc9ddc0", + "metadata": {}, + "source": [ + "In this example, we aim to make updates to a `.yaml` variable codelist file - but we want to do them in Excel.\n", + "\n", + "Basic steps:\n", + "1. Export/load the existing `variable` definition codelist from `.yaml` file in the project directory\n", + "2. Write out this codelist to Excel\n", + "3. Apply edits manually in Excel\n", + "4. Read in the Excel and write out to `.yaml` again.\n", + "\n", + "N.B. You will need to have to have latest version of the workflow repository, e.g. github.com/iiasa/xxxx-workflow. \n", + "Navigate to the `definitions` folder, which typically has folders named `variable`, `region` and `scenario`. Launch the Jupyter notebook from the definitions folder. \n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e6a0e25f", + "metadata": {}, + "outputs": [], + "source": [ + "import nomenclature\n" + ] + }, + { + "cell_type": "markdown", + "id": "748ac323", + "metadata": {}, + "source": [ + "## 1. Export/load the existing `variable` definition codelist \n", + "\n", + "Load the definitions from the current directoy (or give the path as argument), \n", + "e.g. 'C:\\\\Github\\\\engage-internal-workflow\\\\definitions'\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "47b62fcf", + "metadata": {}, + "outputs": [], + "source": [ + "DSD = nomenclature.DataStructureDefinition('.')" + ] + }, + { + "cell_type": "markdown", + "id": "fbb0a185", + "metadata": {}, + "source": [ + "## 2. Write out this codelist to Excel" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "6fc1988f", + "metadata": {}, + "outputs": [], + "source": [ + "# Save the variable CodeList to excel (only do this once)\n", + "temp_excel_out = 'temp_variables_excel.xlsx'\n", + "DSD.variable.to_excel(temp_excel_out, sheet_name='variable')" + ] + }, + { + "cell_type": "markdown", + "id": "2f66438a", + "metadata": {}, + "source": [ + "## 3. Apply edits manually in Excel\n", + "Make your edits in Excel. \n", + "\n", + "Add/remove variables, improve defintions, specify weights and region-aggregations, etc." + ] + }, + { + "cell_type": "markdown", + "id": "c28dc29f", + "metadata": {}, + "source": [ + "## ...." + ] + }, + { + "cell_type": "markdown", + "id": "e715c46c", + "metadata": {}, + "source": [ + "## 4. Read in the Excel and write out to `.yaml` again.\n", + "In `attrs`, specify the additional names of the columns (attributes) that are present in the Excel file. You do no need to specify `Variable` column, as that is provided as the `col` in the `create_yaml_from_xlsx` function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "639b35c4", + "metadata": {}, + "outputs": [], + "source": [ + "# Load and write out directly to yaml\n", + "temp_excel_out = 'temp_variables_excel.xlsx'\n", + "attrs = ['Unit', 'Skip_region_aggregation', 'Check_aggregate',\n", + " 'Description','Required','Note', 'Region_aggregation', 'Weight', ] \n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e9431c3b", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7601d725", + "metadata": {}, + "outputs": [], + "source": [ + "yaml_file_out = 'variable/variables_new.yaml' # Note the name here if you want to be careful about overwriting the previous file.\n", + "nomenclature.create_yaml_from_xlsx(temp_excel_out, yaml_file_out, 'variable', 'Variable', attrs)\n" + ] + }, + { + "cell_type": "markdown", + "id": "fb95a8a5", + "metadata": {}, + "source": [ + "## Notes\n", + "- The new `.yaml` codelist is now written out. You can choose to overwrite it directly. \n", + "- When reading in the `DataStructureDefinition` (step 1.), this will automatically parse all available `.yaml` files, so if your new `.yaml` file is also present and you repeat the process, you will likely get a duplication error. \n", + "- New `.yaml` files may come with extra attribute columns, and/or default values (e.g. `skip-aggregation=False`) as new functions and defaults are added to `nomenclature`.\n", + "\n", + " " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "89835034", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "3a8438d9", + "metadata": {}, + "outputs": [], + "source": [ + "# Check that it loads and validation checks pass again (you'll need to ensure old file is not present)\n", + "DSD1 = nomenclature.DataStructureDefinition('.')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "f0160797", + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "913675ae", + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.8.15" + } + }, + "nbformat": 4, + "nbformat_minor": 5 +} From abf4cfe4679268ca1e0e6b7fcd543df58f4e095a Mon Sep 17 00:00:00 2001 From: Edward Byers Date: Fri, 13 Jan 2023 17:14:40 +0100 Subject: [PATCH 2/2] Edit variable page to add ref to tutorial --- doc/source/user_guide/variable.rst | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/doc/source/user_guide/variable.rst b/doc/source/user_guide/variable.rst index 8bf056d9..3738bd15 100644 --- a/doc/source/user_guide/variable.rst +++ b/doc/source/user_guide/variable.rst @@ -155,3 +155,9 @@ sum up to the value of the category. The feature uses the **pyam** method * The method :meth:`DataStructureDefinition.check_aggregate` returns a :class:`pandas.DataFrame` with a comparison of the original value and the computed aggregate for all variables that fail the validation. + + Editing a CodeList + ------------------ + A codelist can be edited directly as the `yaml` file, although this may not always be convenient. + + Another alternative is to generate an `Excel` version of the codelist, make the necessary edits in Excel, and then process this back into the correctly formatted `yaml` file. to do this, see this tutorial :ref:`tutorial_edit_existing_variable_codelist`.