-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Acumulating errors for groups, fields and attributes. #175
Conversation
@domna here is the primary approach. Did you mean something like this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works nicely for me already. However, I would just use pythons logging to report on the fields which are missing and just raise a general warning (e.g., Missing required fields
), maybe with a custom Error for this (so we can specifically catch this). That way we can also suppress the warnings externally if we want (e.g., when we want to write @partial
).
I think it would also be useful to have a flag in pynxtools config to switch failing on or off here in case we just want to write a @partial
appdef or we just catch a custom Error. When we add support for writing partials we could add a cli argument for this.
We could also add the @partial
resolving here, but I think it is more involved, because we actually have to check which concepts we supply and it might be easier to request this via cli (i.e., we don't say --nxdl NXmy_nxdl
but rather --nxdl NXmy_nxdl/NXinstrument,NXmy_nxdl/NXsample
)
I think @sherjeelshabih might also have an opinion on this.
pynxtools/dataconverter/helpers.py
Outdated
@@ -456,15 +459,28 @@ def ensure_all_required_fields_exist(template, data, nxdl_root): | |||
opt_parent = check_for_optional_parent(path, nxdl_root) | |||
if opt_parent != "<<NOT_FOUND>>": | |||
if does_group_exist(opt_parent, data) and not does_group_exist(renamed_path, data): | |||
raise ValueError(f"The required group, {path}, hasn't been supplied" | |||
f" while its optional parent, {path}, is supplied.") | |||
missing_groups_and_fields['groups'].appends(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would just use logging here and do a logger.warning
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, you do not want the converter blocked, even the required fields are not rendered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what I had in mind. No need to collect all these errors like this. We can just output the warnings.
pynxtools/dataconverter/helpers.py
Outdated
if entry_name.startswith('@'): | ||
missing_groups_and_fields['attributes'].append(path) | ||
else: | ||
missing_groups_and_fields['fields'].append(path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also can just be a logger.warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree!
pynxtools/dataconverter/helpers.py
Outdated
if path_list: | ||
for path in path_list: | ||
error = error + f" \n\u2022{path}\n" | ||
if error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if error: | |
if missing_groups_and_fields: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not work, because missing_group_and_fields
is not an empty dict.
In [1]: dic = {'xx':[]}
In [2]: if dic:
...: print('somthing')
...:
somthing
I see that there are two points:
|
I would say once we have the warning messages, let's not even throw an error. We want to be as permissive as possible now. And in another PR we will automatically add the @partial attribute when we detect something missing. So with those two, I guess we won't need to raise anything. @RubelMozumder we should definitely check how to get the logger right from Nomad. I thought this was already being done. It will be best to not raise any errors and just let the Nomad GUI display warnings for missing fields. The workflow shouldn't be stopped like that. |
Did not fully read all the comments yet but just wanted to leave my two cents on this: This is why I suggested using a specific error message for this. That way we could also catch it in nomad and just ignore it, but we could also fail in pynxtools if you want to fail on this (like a non-partial mode, e.g., I want to reliably write a full appdef). But I would be fine with non-failing by default (also that way we can directly throw it into nomad without any modification). |
pynxtools/dataconverter/convert.py
Outdated
@@ -72,8 +72,12 @@ def convert(input_file: Tuple[str], | |||
generate_template: bool = False, | |||
fair: bool = False, | |||
undocumented: bool = False, | |||
logger_: logging.Logger = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The best practice for logging is different. The docs recommend to use a modul level logger with a name, e.g., creating logger
variable at the start of the file (see https://docs.python.org/3/howto/logging.html#advanced-logging-tutorial):
import logging
logger = logging.getLogger(__file__)
Then you can just use logger.warning("My warning")
inside the file. Also use %s
syntax when formatting the string, e.g., logger.warning("Invalid value: %s", my_var)
when logging my_var, because logging does not evaluate my_var in case the logging level is disabled.
This allows another tool using pynxtools to hook into the logger with our name and display the emitted logs. I don't know if we have to follow naming conventions for nomad (I had this for uvicorn where I needed to create a name like logging.getLogger(f'uvicorn.{__file__}')
and then everything was displayed automagically).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try to follow it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, when you mentioned about logger
warning in this, I also thought to have something like this as you mentioned here nicely, thanks. I see the beauty of logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing, logger.warning("Invalid value: %s", my_var)
does not replace %s when in nomad logger. But the same syntax works in console when calling reader.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Module level logger in pynxtools is also working nicely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing,
logger.warning("Invalid value: %s", my_var)
does not replace %s when in nomad logger. But the same syntax works in console when calling reader.
Ok, this is odd. Normally, this is recommended (see https://docs.python.org/3/howto/logging.html#optimization). Maybe this behaviour is deactivated for some reason in nomad, because there I just see f-strings being used (and the pylint errors are also deactivated). So we can just follow the nomad way: use f-strings and add logging-not-lazy
, logging-format-interpolation
, logging-fstring-interpolation
to .pylintrc
. But I would be interested why it is like this in nomad
6987406
to
2e4c667
Compare
🎉 Yes, pythons logging just works like magic 🔮 |
from Sheerjeel Co-authored-by: Sherjeel Shabih <[email protected]>
ce5b4af
to
ead1b4b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this PR are a lot of additional features. Please cleanup this PR to only add the features relevant for proper logging of the writing errors.
@@ -0,0 +1,40 @@ | |||
# XRD reader |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this part of this PR?
Please remove the xrd part and make it separate PR if necessary.
@@ -510,6 +510,7 @@ typing-extensions==4.3.0 | |||
# astroid | |||
# mypy | |||
# numcodecs | |||
# pylint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This changes can be reverted, because it is not changing anything.
logger = logging.getLogger(__name__) # pylint: disable=C0103 | ||
UNDOCUMENTED = 9 | ||
logger.setLevel(logging.INFO) | ||
logger.addHandler(logging.StreamHandler(sys.stdout)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why has the UNDOCUMENTED
level been removed?
I introduced to be able to separate info and undocumented loggings.
|
||
import logging | ||
|
||
logger = logging.getLogger("pynxtools") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any benefit of having a central logger
instance?
@@ -32,7 +33,7 @@ | |||
from pynxtools.nexus import nexus | |||
from pynxtools.nexus.nexus import NxdlAttributeError | |||
|
|||
logger = logging.getLogger(__name__) | |||
pynx_logger = logging.getLogger(__name__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inconsistent. In other places you use the central logger and here you use a local logger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's came from rebase. I will fixed
it
def validate_data_dict(template, data, nxdl_root: ET.Element): | ||
def validate_data_dict(template, data, | ||
nxdl_root: ET.Element, | ||
logger=pynx_logger): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Passing of logging shouldn't also be required here.
@@ -589,7 +594,7 @@ def convert_to_hill(atoms_typ): | |||
return atom_list + list(atoms_typ) | |||
|
|||
|
|||
def add_default_root_attributes(data, filename): | |||
def add_default_root_attributes(data, filename, logger=pynx_logger): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also not required to be passed here.
try: | ||
if not self.is_hdf5_file_obj: | ||
self.in_file = h5py.File( | ||
self.input_file_name[0] | ||
if isinstance(self.input_file_name, list) | ||
else self.input_file_name, 'r' | ||
) | ||
else: | ||
self.in_file = self.input_file_name | ||
|
||
self.full_visit(self.in_file, self.in_file, '', self.visit_node) | ||
|
||
if self.d_inq_nd is None and self.c_inq_nd is None: | ||
get_default_plotable(self.in_file, self.logger) | ||
# To log the provided concept and concepts founded | ||
if self.c_inq_nd is not None: | ||
for hdf_path in self.hdf_path_list_for_c_inq_nd: | ||
self.logger.info(hdf_path) | ||
finally: | ||
# To test if hdf_file is open print(self.in_file.id.valid) | ||
self.in_file.close() | ||
# To test if hdf_file is open print(self.in_file.id.valid) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems also not related to the logging
def get(self, key, return_value=None): | ||
"""Implementing get method for template. | ||
Parameters | ||
---------- | ||
key : str | ||
Template key | ||
return_value : Any | ||
return : | ||
The value comes with return_value | ||
""" | ||
val = self.optional.get(key, None) | ||
if val is None: | ||
val = self.recommended.get(key, None) | ||
if val is None: | ||
val = self.required.get(key, None) | ||
if val is None: | ||
val = self.undocumented.get(key, None) | ||
if val is None: | ||
return return_value | ||
return val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also not related to the logging parts
def transform_to_intended_dt(str_value: Any) -> Optional[Any]: | ||
"""Transform string to the intended data type, if not then return str_value. | ||
|
||
E.g '2.5E-2' will be transfor into 2.5E-2 | ||
tested with: '2.4E-23', '28', '45.98', 'test', ['59', '3.00005', '498E-34'], | ||
'23 34 444 5000', None | ||
with result: 2.4e-23, 28, 45.98, test, [5.90000e+01 3.00005e+00 4.98000e-32], | ||
np.array([23 34 444 5000]), None | ||
NOTE: add another arg in this func for giving 'hint' what kind of data like | ||
numpy array or list | ||
Parameters | ||
---------- | ||
str_value : str | ||
Data from other format that comes as string e.g. string of list. | ||
|
||
Returns | ||
------- | ||
Union[str, int, float, np.ndarray] | ||
Converted data type | ||
""" | ||
|
||
symbol_list_for_data_seperation = [';', ' '] | ||
transformed: Any = None | ||
|
||
if isinstance(str_value, list): | ||
try: | ||
transformed = np.array(str_value, dtype=np.float64) | ||
return transformed | ||
except ValueError: | ||
pass | ||
|
||
elif isinstance(str_value, np.ndarray): | ||
return str_value | ||
elif isinstance(str_value, str): | ||
try: | ||
transformed = int(str_value) | ||
except ValueError: | ||
try: | ||
transformed = float(str_value) | ||
except ValueError: | ||
if '[' in str_value and ']' in str_value: | ||
transformed = json.loads(str_value) | ||
if transformed is not None: | ||
return transformed | ||
for sym in symbol_list_for_data_seperation: | ||
if sym in str_value: | ||
parts = str_value.split(sym) | ||
modified_parts = [] | ||
for part in parts: | ||
modified_parts.append(transform_to_intended_dt(part)) | ||
return transform_to_intended_dt(modified_parts) | ||
|
||
return str_value | ||
|
||
|
||
def nested_dict_to_slash_separated_path(nested_dict: dict, | ||
flattened_dict: dict, | ||
parent_path=''): | ||
"""Convert nested dict into slash separeted path upto certain level.""" | ||
sep = '/' | ||
|
||
for key, val in nested_dict.items(): | ||
path = parent_path + sep + key | ||
if isinstance(val, dict): | ||
nested_dict_to_slash_separated_path(val, flattened_dict, path) | ||
else: | ||
flattened_dict[path] = val |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be not related to the logging feature.
2331052
to
49daf29
Compare
This PR is replaced by PR #191 . |
Fixes #167