Allow wildcard-items in Code #432

danielhuppmann · 2024-11-26T11:20:53Z

To allow more flexibility for reporting of technical parameters, we want to allow "wildcard-codes" echoing the wildcard-implementation in pyam using *.

Concept: a VariableCode can be defined as

- Capital Cost|Electricity|Coal|*:
    description: Technology-specific capital cost of a newly installed plant to generate
      electricity from coal
    unit: USD_2010/kW

The validation-method should then accept any variable that matches the code-name including any string for the wildcard.

This can follow the implementation by @phackstock here

nomenclature/nomenclature/config.py

Line 34 in f210213

pattern = re.compile(escape_regexp(filter_value) + "$")

To be explicit, any of the following variables should pass validation:

Capital Cost|Electricity|Coal|1
Capital Cost|Electricity|Coal|Lignite
Capital Cost|Electricity|Coal|Super-Critical
Capital Cost|Electricity|Coal|really dirty stuff

The text was updated successfully, but these errors were encountered:

danielhuppmann · 2024-11-27T21:33:24Z

Maybe this was already implemented in #397, please double-check.

dc-almeida · 2024-11-27T21:35:40Z

Indeed I was checking that today with some tests, will confirm tomorrow

danielhuppmann · 2024-11-28T21:09:44Z

Follow-up because I did some tests myself: wildcard * in variable names work, but the units are not checked. Plus there may be some difficulties here because there may be multiple possibly matching VariableCode items for a variable, e.g.,

- Capital Cost|Hydrogen|*:
    description: ...
    unit: USD_2010/kW
- Capital Cost|Hydrogen|Fossil*:
    description: ...
    unit: EUR_2020/kW

Not saying that this makes sense, but if there is now a variable "Capital Cost|Hydrogen|Fossil|Coal" in an IamDataFrame, it's not clear which unit should apply...

phackstock · 2024-12-02T12:58:28Z

@danielhuppmann, thanks for the checking. I also looked at the code in detail now and I think there's a couple of different ways we could go about the issue of unit ambiguity that you mentioned.

Skip the unit check for any wildcard variable (simple but potentially dangerous and confusing later down the line)
Add additional units to cover all options
As an additional check for a VariableCodeList itself (without any input data), make sure that no variable pattern matches anything inside the code list. This way the matches are unambiguous and we could enforce specific units after all.

In the interest of keeping patterns as simple as possible and avoid ambiguity as much as possible I'd suggestion option 3.

danielhuppmann · 2024-12-02T13:15:11Z

3 is a nice idea, but probably takes a bit more time to implement.

So I suggest to implement a simple "if the variable to be validated matches the wildcard-codelist, the unit must match" (which might cause issues in corner cases but probably not that relevant in practice anyway).

Then add a sanity-check to be called during validate-project that wildcard-codes use not have well-defined duplicates.

phackstock · 2024-12-02T13:46:10Z

Then add a sanity-check to be called during validate-project that wildcard-codes use not have well-defined duplicates.

I have read your suggestion a couple of times now but I fail to understand how that is different to my point 3. What I was describing as this additional check is what I believe you are calling "sanity-check". I'd implement it as a (surprise, surprise) pydantic validator for VariableCodeList. You'd check if any wildcard variable matches any other wildcard variable.

danielhuppmann · 2024-12-02T13:58:21Z

Sorry for not being clear. Parsing a DataStructureDefinition for large projects is already taking quite some time, so adding yet another pydantic-validator (executed every time) might not be the smartest move.

Hence my suggestion to implement that as a validation-method that is not executed when initializing the DataStructureDefinition but only as part of the validate-project CLI (so for example as part of GitHub Actions in a workflow repository).

phackstock · 2024-12-02T14:14:33Z

Parsing a DataStructureDefinition for large projects is already taking quite some time, so adding yet another pydantic-validator (executed every time) might not be the smartest move.

Without having run any benchmarks on that, doesn't reading in data, which we usually do when using nomenclature, typically take order(s) of magnitude longer?
Where is the performance of the validators an issues currently? Do you mean in the scenario processing, in the testing of PRs, running locally, ...?

danielhuppmann added the enhancement New feature or request label Nov 26, 2024

danielhuppmann assigned dc-almeida Nov 26, 2024

danielhuppmann mentioned this issue Nov 28, 2024

Add technology-specific variables for techno-economic parameters IAMconsortium/common-definitions#223

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow wildcard-items in Code #432

Allow wildcard-items in Code #432

danielhuppmann commented Nov 26, 2024

danielhuppmann commented Nov 27, 2024

dc-almeida commented Nov 27, 2024

danielhuppmann commented Nov 28, 2024 •

edited

Loading

phackstock commented Dec 2, 2024

danielhuppmann commented Dec 2, 2024

phackstock commented Dec 2, 2024

danielhuppmann commented Dec 2, 2024

phackstock commented Dec 2, 2024 •

edited

Loading

Allow wildcard-items in Code #432

Allow wildcard-items in Code #432

Comments

danielhuppmann commented Nov 26, 2024

danielhuppmann commented Nov 27, 2024

dc-almeida commented Nov 27, 2024

danielhuppmann commented Nov 28, 2024 • edited Loading

phackstock commented Dec 2, 2024

danielhuppmann commented Dec 2, 2024

phackstock commented Dec 2, 2024

danielhuppmann commented Dec 2, 2024

phackstock commented Dec 2, 2024 • edited Loading

danielhuppmann commented Nov 28, 2024 •

edited

Loading

phackstock commented Dec 2, 2024 •

edited

Loading