Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow wildcard-items in Code #432

Open
danielhuppmann opened this issue Nov 26, 2024 · 8 comments
Open

Allow wildcard-items in Code #432

danielhuppmann opened this issue Nov 26, 2024 · 8 comments
Assignees
Labels
enhancement New feature or request

Comments

@danielhuppmann
Copy link
Member

To allow more flexibility for reporting of technical parameters, we want to allow "wildcard-codes" echoing the wildcard-implementation in pyam using *.

Concept: a VariableCode can be defined as

- Capital Cost|Electricity|Coal|*:
    description: Technology-specific capital cost of a newly installed plant to generate
      electricity from coal
    unit: USD_2010/kW

The validation-method should then accept any variable that matches the code-name including any string for the wildcard.

This can follow the implementation by @phackstock here

pattern = re.compile(escape_regexp(filter_value) + "$")

To be explicit, any of the following variables should pass validation:

  • Capital Cost|Electricity|Coal|1
  • Capital Cost|Electricity|Coal|Lignite
  • Capital Cost|Electricity|Coal|Super-Critical
  • Capital Cost|Electricity|Coal|really dirty stuff
@danielhuppmann danielhuppmann added the enhancement New feature or request label Nov 26, 2024
@danielhuppmann
Copy link
Member Author

Maybe this was already implemented in #397, please double-check.

@dc-almeida
Copy link
Collaborator

Indeed I was checking that today with some tests, will confirm tomorrow

@danielhuppmann
Copy link
Member Author

danielhuppmann commented Nov 28, 2024

Follow-up because I did some tests myself: wildcard * in variable names work, but the units are not checked. Plus there may be some difficulties here because there may be multiple possibly matching VariableCode items for a variable, e.g.,

- Capital Cost|Hydrogen|*:
    description: ...
    unit: USD_2010/kW
- Capital Cost|Hydrogen|Fossil*:
    description: ...
    unit: EUR_2020/kW

Not saying that this makes sense, but if there is now a variable "Capital Cost|Hydrogen|Fossil|Coal" in an IamDataFrame, it's not clear which unit should apply...

@phackstock
Copy link
Contributor

@danielhuppmann, thanks for the checking. I also looked at the code in detail now and I think there's a couple of different ways we could go about the issue of unit ambiguity that you mentioned.

  1. Skip the unit check for any wildcard variable (simple but potentially dangerous and confusing later down the line)
  2. Add additional units to cover all options
  3. As an additional check for a VariableCodeList itself (without any input data), make sure that no variable pattern matches anything inside the code list. This way the matches are unambiguous and we could enforce specific units after all.

In the interest of keeping patterns as simple as possible and avoid ambiguity as much as possible I'd suggestion option 3.

@danielhuppmann
Copy link
Member Author

3 is a nice idea, but probably takes a bit more time to implement.

So I suggest to implement a simple "if the variable to be validated matches the wildcard-codelist, the unit must match" (which might cause issues in corner cases but probably not that relevant in practice anyway).

Then add a sanity-check to be called during validate-project that wildcard-codes use not have well-defined duplicates.

@phackstock
Copy link
Contributor

Then add a sanity-check to be called during validate-project that wildcard-codes use not have well-defined duplicates.

I have read your suggestion a couple of times now but I fail to understand how that is different to my point 3. What I was describing as this additional check is what I believe you are calling "sanity-check". I'd implement it as a (surprise, surprise) pydantic validator for VariableCodeList. You'd check if any wildcard variable matches any other wildcard variable.

@danielhuppmann
Copy link
Member Author

Sorry for not being clear. Parsing a DataStructureDefinition for large projects is already taking quite some time, so adding yet another pydantic-validator (executed every time) might not be the smartest move.

Hence my suggestion to implement that as a validation-method that is not executed when initializing the DataStructureDefinition but only as part of the validate-project CLI (so for example as part of GitHub Actions in a workflow repository).

@phackstock
Copy link
Contributor

phackstock commented Dec 2, 2024

Parsing a DataStructureDefinition for large projects is already taking quite some time, so adding yet another pydantic-validator (executed every time) might not be the smartest move.

Without having run any benchmarks on that, doesn't reading in data, which we usually do when using nomenclature, typically take order(s) of magnitude longer?
Where is the performance of the validators an issues currently? Do you mean in the scenario processing, in the testing of PRs, running locally, ...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants