Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

define_unit override argument #239

Open
wants to merge 2 commits into
base: main
Choose a base branch
from
Open

Conversation

rjfarber
Copy link

@rjfarber rjfarber commented Jul 8, 2022

I'm not sure this is fully desirable (or maybe this capability exists and I'm just missing it) but I'm currently in the situation where I messed up a custom defined unit and wanted to simply override the definition (similar to how derived_fields can be redefined).

So I added an optional argument in def define_unit(...,override=False) and don't trigger the raise if override is True. I have to run to a meeting but thought I might as well kick off this discussion rather than spend a lot of time on making this a complete PR right now in case this is not desirable.

Anyway happy Friday!

…y. Not nearly a complete PR since I'm not fully sure this is desirable beyond me (and possibly I should be doing this another way...)
Copy link
Member

@neutrinoceros neutrinoceros left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rjfarber
This looks like a reasonable request to me and your implementation is sound. I'll take a deeper look tomorrow to make sure this is the simplest way to do it (but it seems likely).
For now I just have a couple minor suggestions:

  • force_override or allow_override would make slightly more sense to me as an argument name, because override=True reads as something that's never a no-op.
  • I'd like that any new arguments was keyword only, because it's much easier to maintain for us in the future.

@neutrinoceros
Copy link
Member

So I don't think there's any documented/supported way to override pre-defined units at the moment.
Within my admittedly limited understanding of the code involved, I also cannot foresee any fundamental problem with this approach.
Here's what I could grasp: under the hood, this function updates a UnitRegistry instance, which is a relatively flat data structure: I don't think that it preserves a hierarchy of dependencies between units. My conclusion is that there should be no risk of getting stuck with a corrupt state with cyclic references and whatnot.

If you want to pursue this, feel free to. Most importantly we would need some documentation and tests.

@jzuhone
Copy link
Contributor

jzuhone commented Jul 15, 2022

So the UnitRegistry class has a modify method:

unyt/unyt/unit_registry.py

Lines 199 to 229 in 10ab887

def modify(self, symbol, base_value):
"""
Change the base value of a unit symbol. Useful for adjusting code
units after parsing parameters.
Parameters
----------
symbol : str
The name of the symbol to modify
base_value : float
The new base_value for the symbol.
"""
self._unit_system_id = None
if symbol not in self.lut:
raise SymbolNotFoundError(
"Tried to modify the symbol '%s', but it does not exist "
"in this registry." % symbol
)
if hasattr(base_value, "in_base"):
new_dimensions = base_value.units.dimensions
base_value = base_value.in_base("mks")
base_value = base_value.value
else:
new_dimensions = self.lut[symbol][1]
self.lut[symbol] = (float(base_value), new_dimensions) + self.lut[symbol][2:]

I'm not sure if this does what you want it to or not.

@neutrinoceros
Copy link
Member

Oh I missed that, thanks for pointing it out !
So UnitRegistry.modify only works for previously defined units, while define_unit currently only works for adding new units. I think making define_unit the go-to function to do both is a good idea and solidifies the API.

unyt.define_unit is a top level user-facing function which encapsulates lower level concepts such as UnitRegistry management, it may make sense to plug a call to UnitRegistry.modify in the override implementation.

@jzuhone
Copy link
Contributor

jzuhone commented Jul 16, 2022

I guess I have a question about the use case--@rjfarber, if you messed up the unit, is there a reason we don't just fix the unit in the script and then re-run? Or is there a reason we should start with an incorrect value and then change it?

I'm only a bit hesitant to introduce this functionality because in normal cases we actually don't want to redefine units--a kpc is going to be a kpc, and if the standard changes then we change the hard-coded value instead. UnitRegistry.modify is there mainly for code units. I worry making it too easy to change units could lead to unpredictable behavior.

However, maybe I am not understanding the use case well enough.

@rjfarber
Copy link
Author

rjfarber commented Jul 16, 2022 via email

@neutrinoceros
Copy link
Member

neutrinoceros commented Jul 16, 2022

Furthermore, giving that power to users is perfectly in line with how yt.derived_field works. One could argue that yt and unyt don't have to follow the exact same design principles but they do share a significant fraction of their users ...

@jzuhone
Copy link
Contributor

jzuhone commented Jul 19, 2022

@rjfarber @neutrinoceros ok, so I am now convinced of the utility of this (and thanks for the specific Jupyter example, I totally get it).

so I actually think what would be preferable in this situation is a new function, modify_unit, that wraps UnitRegistry.modify and does whatever else needs to be done. It's an extra function, but the name also makes clear what's happening and we can limit what's going on inside of it to unit modification.

What do you guys think? One issue that might be tricky to solve (as I discovered recently myself) is making sure that when you modify a unit it propagates to the units that you can get from the top level, i.e. from unyt import Zsun.

I'm happy to take a crack at this if you're interested, or you can feel free to do so!

@neutrinoceros
Copy link
Member

so I actually think what would be preferable in this situation is a new function, modify_unit, that wraps UnitRegistry.modify and does whatever else needs to be done. It's an extra function, but the name also makes clear what's happening and we can limit what's going on inside of it to unit modification.
What do you guys think?

TBH I'm not a fan, it'd be less discoverable and even if you know about it, you'd need to change your code to run it again.
In the context of a Jupyter notebook, you want to be able to write code that runs without interrupting your workflow, no matter how many times you (re)run the cell (at least, ideally). I think that's only achieved with a allow_override-like argument.

What do you guys think? One issue that might be tricky to solve (as I discovered recently myself) is making sure that when you modify a unit it propagates to the units that you can get from the top level, i.e. from unyt import Zsun

Indeed this sounds like a major challenge, regardless the approach. Unit objects are intentionally immutable so I guess you need to replace any existing one with it ?

@jzuhone
Copy link
Contributor

jzuhone commented Sep 7, 2022

@rjfarber I haven't forgotten about this, sorry for the delay. I think in a couple of weeks we should try getting it merged in

@rjfarber
Copy link
Author

rjfarber commented Sep 7, 2022 via email

@neutrinoceros
Copy link
Member

We haven't discussed it in other channels. I'm still inclined to think an override-like argument to an existing function is the best option.

@jzuhone
Copy link
Contributor

jzuhone commented Sep 14, 2022

@rjfarber so I've come around to this idea--but I would like to see a couple of things (potentially) added here:

  1. It would be nice if we could indicate that a unit is being overridden somehow. But we don't have a logger, and I am hesitant to add one, especially since we don't have more use cases. And we shouldn't use a print statement or issue a warning either. So not sure if there is a good way to log this, open to suggestions.
  2. We just need a short bit in the docs about this, one or two sentences is probably fine. If you need help with that let me know.

@neutrinoceros
Copy link
Member

we don't have a logger, and I am hesitant to add one, especially since we don't have more use cases. And we shouldn't use a print statement or issue a warning either. So not sure if there is a good way to log this, open to suggestions.

Agreed that adding a logger for this seems like a long shot, and a plain print statement would be terrible style, but what's wrong with a warning ? If you fear this would be too noisy, I can see two ways we could add a warning and keep it "quiet" in most cases:

  • we could subclass (or use) builtin ResourceWarning, which aren't showed by default but will be treated as errors in CI here and downstream, greatly reducing the risk that this ends up in production code by accident https://docs.python.org/3/library/exceptions.html#ResourceWarning
  • or we could add another option to silence the warning. I must say I'm not a fan of this option.

@rjfarber
Copy link
Author

rjfarber commented Sep 15, 2022 via email

@neutrinoceros
Copy link
Member

I guess you both will be horrified
to hear this, but one my favorite stackoverflow answers is:

I don't condone it, but you could just suppress all warnings with this:

First of all: 😱
Second of all: we're well aware that's a thing, and we'd prefer to not put our users in a position where they would need or want this.

…the official test_define_unit.py one. Modified docstring and removed an if in define_unit. Commiting to push and then reinstall to test.
@rjfarber
Copy link
Author

rjfarber commented Sep 16, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants