Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Acumulating errors for groups, fields and attributes. #175

Closed
wants to merge 35 commits into from

Conversation

RubelMozumder
Copy link
Collaborator

@RubelMozumder RubelMozumder commented Oct 31, 2023

  • Accumulatting all the error for required fields, groups and attributes, later raise an error.

Fixes #167

@RubelMozumder
Copy link
Collaborator Author

@domna here is the primary approach. Did you mean something like this?
With this change, the converter will not stop for individual error but rather collect all required fields, groups, and attributes that have not been supplied. Finally will raise the error.

Copy link
Collaborator

@domna domna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works nicely for me already. However, I would just use pythons logging to report on the fields which are missing and just raise a general warning (e.g., Missing required fields), maybe with a custom Error for this (so we can specifically catch this). That way we can also suppress the warnings externally if we want (e.g., when we want to write @partial).

I think it would also be useful to have a flag in pynxtools config to switch failing on or off here in case we just want to write a @partial appdef or we just catch a custom Error. When we add support for writing partials we could add a cli argument for this.

We could also add the @partial resolving here, but I think it is more involved, because we actually have to check which concepts we supply and it might be easier to request this via cli (i.e., we don't say --nxdl NXmy_nxdl but rather --nxdl NXmy_nxdl/NXinstrument,NXmy_nxdl/NXsample)

I think @sherjeelshabih might also have an opinion on this.

@@ -456,15 +459,28 @@ def ensure_all_required_fields_exist(template, data, nxdl_root):
opt_parent = check_for_optional_parent(path, nxdl_root)
if opt_parent != "<<NOT_FOUND>>":
if does_group_exist(opt_parent, data) and not does_group_exist(renamed_path, data):
raise ValueError(f"The required group, {path}, hasn't been supplied"
f" while its optional parent, {path}, is supplied.")
missing_groups_and_fields['groups'].appends(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would just use logging here and do a logger.warning.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you do not want the converter blocked, even the required fields are not rendered.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what I had in mind. No need to collect all these errors like this. We can just output the warnings.

Comment on lines 468 to 471
if entry_name.startswith('@'):
missing_groups_and_fields['attributes'].append(path)
else:
missing_groups_and_fields['fields'].append(path)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also can just be a logger.warning

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree!

if path_list:
for path in path_list:
error = error + f" \n\u2022{path}\n"
if error:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if error:
if missing_groups_and_fields:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not work, because missing_group_and_fields is not an empty dict.

In [1]: dic = {'xx':[]}

In [2]: if dic:
   ...:     print('somthing')
   ...: 
somthing

@RubelMozumder
Copy link
Collaborator Author

This works nicely for me already. However, I would just use pythons logging to report on the fields which are missing and just raise a general warning (e.g., Missing required fields), maybe with a custom Error for this (so we can specifically catch this). That way we can also suppress the warnings externally if we want (e.g., when we want to write @partial).

I think it would also be useful to have a flag in pynxtools config to switch failing on or off here in case we just want to write a @partial appdef or we just catch a custom Error. When we add support for writing partials we could add a cli argument for this.

We could also add the @partial resolving here, but I think it is more involved, because we actually have to check which concepts we supply and it might be easier to request this via cli (i.e., we don't say --nxdl NXmy_nxdl but rather --nxdl NXmy_nxdl/NXinstrument,NXmy_nxdl/NXsample)

I think @sherjeelshabih might also have an opinion on this.

I see that there are two points:

  1. Having logger warning and finally raise a general (with out nofitying any missing fields) error to get the reader caught.
    Still I see a pitfall in general error massage, the nomad loger does not catch the logger info from pynxtools. So the a general error massage would not convey any valuable info. One thing, now we can do having logger massage as you mentioned and error massage with the list of missing required field, groups, and attributes. So the error massage will also be meaningful in Noamd. Probably, @sherjeelshabih may have something to include.

    Another thing, I want to mention here, we need create a general logger ogject, that can take other logger object and pass the log massages from pynxtools. But this must come through another separate development and PR. @sherjeelshabih what do you think?

  2. The @partial resolution needs more discussion and a refactor lots of code and files.
    Because, from the template genration point of view it is fine. But the conflicts come after manipulation of template inside reader. Because, the motive of reader to work with application definition, not with base definition. So before getting any log massage or error reader will block the process. So, this kind of modification needs a different PR with lots of changes. @sherjeelshabih what is you opinion?

@sherjeelshabih
Copy link
Collaborator

I would say once we have the warning messages, let's not even throw an error. We want to be as permissive as possible now. And in another PR we will automatically add the @partial attribute when we detect something missing. So with those two, I guess we won't need to raise anything.

@RubelMozumder we should definitely check how to get the logger right from Nomad. I thought this was already being done. It will be best to not raise any errors and just let the Nomad GUI display warnings for missing fields. The workflow shouldn't be stopped like that.

@domna
Copy link
Collaborator

domna commented Nov 1, 2023

I would say once we have the warning messages, let's not even throw an error. We want to be as permissive as possible now. And in another PR we will automatically add the @partial attribute when we detect something missing. So with those two, I guess we won't need to raise anything.

Did not fully read all the comments yet but just wanted to leave my two cents on this: This is why I suggested using a specific error message for this. That way we could also catch it in nomad and just ignore it, but we could also fail in pynxtools if you want to fail on this (like a non-partial mode, e.g., I want to reliably write a full appdef). But I would be fine with non-failing by default (also that way we can directly throw it into nomad without any modification).

@@ -72,8 +72,12 @@ def convert(input_file: Tuple[str],
generate_template: bool = False,
fair: bool = False,
undocumented: bool = False,
logger_: logging.Logger = None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The best practice for logging is different. The docs recommend to use a modul level logger with a name, e.g., creating logger variable at the start of the file (see https://docs.python.org/3/howto/logging.html#advanced-logging-tutorial):

import logging

logger = logging.getLogger(__file__)

Then you can just use logger.warning("My warning") inside the file. Also use %s syntax when formatting the string, e.g., logger.warning("Invalid value: %s", my_var) when logging my_var, because logging does not evaluate my_var in case the logging level is disabled.

This allows another tool using pynxtools to hook into the logger with our name and display the emitted logs. I don't know if we have to follow naming conventions for nomad (I had this for uvicorn where I needed to create a name like logging.getLogger(f'uvicorn.{__file__}') and then everything was displayed automagically).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try to follow it.

Copy link
Collaborator Author

@RubelMozumder RubelMozumder Nov 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, when you mentioned about logger warning in this, I also thought to have something like this as you mentioned here nicely, thanks. I see the beauty of logger.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing, logger.warning("Invalid value: %s", my_var) does not replace %s when in nomad logger. But the same syntax works in console when calling reader.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Module level logger in pynxtools is also working nicely.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing, logger.warning("Invalid value: %s", my_var) does not replace %s when in nomad logger. But the same syntax works in console when calling reader.

Ok, this is odd. Normally, this is recommended (see https://docs.python.org/3/howto/logging.html#optimization). Maybe this behaviour is deactivated for some reason in nomad, because there I just see f-strings being used (and the pylint errors are also deactivated). So we can just follow the nomad way: use f-strings and add logging-not-lazy, logging-format-interpolation, logging-fstring-interpolation to .pylintrc. But I would be interested why it is like this in nomad

@RubelMozumder
Copy link
Collaborator Author

nomad_logger

So, there is no blocker, pynxtools reader is now passing the missing required concepts with warning.

@domna
Copy link
Collaborator

domna commented Nov 1, 2023

So, there is no blocker, pynxtools reader is now passing the missing required concepts with warning.

🎉 Yes, pythons logging just works like magic 🔮

@RubelMozumder RubelMozumder marked this pull request as ready for review November 7, 2023 16:49
@domna domna mentioned this pull request Nov 21, 2023
Copy link
Collaborator

@domna domna left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR are a lot of additional features. Please cleanup this PR to only add the features relevant for proper logging of the writing errors.

@@ -0,0 +1,40 @@
# XRD reader
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this part of this PR?
Please remove the xrd part and make it separate PR if necessary.

@@ -510,6 +510,7 @@ typing-extensions==4.3.0
# astroid
# mypy
# numcodecs
# pylint
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This changes can be reverted, because it is not changing anything.

Comment on lines -43 to -46
logger = logging.getLogger(__name__) # pylint: disable=C0103
UNDOCUMENTED = 9
logger.setLevel(logging.INFO)
logger.addHandler(logging.StreamHandler(sys.stdout))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has the UNDOCUMENTED level been removed?
I introduced to be able to separate info and undocumented loggings.


import logging

logger = logging.getLogger("pynxtools")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any benefit of having a central logger instance?

@@ -32,7 +33,7 @@
from pynxtools.nexus import nexus
from pynxtools.nexus.nexus import NxdlAttributeError

logger = logging.getLogger(__name__)
pynx_logger = logging.getLogger(__name__)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is inconsistent. In other places you use the central logger and here you use a local logger.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's came from rebase. I will fixed
it

def validate_data_dict(template, data, nxdl_root: ET.Element):
def validate_data_dict(template, data,
nxdl_root: ET.Element,
logger=pynx_logger):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing of logging shouldn't also be required here.

@@ -589,7 +594,7 @@ def convert_to_hill(atoms_typ):
return atom_list + list(atoms_typ)


def add_default_root_attributes(data, filename):
def add_default_root_attributes(data, filename, logger=pynx_logger):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also not required to be passed here.

Comment on lines +646 to +667
try:
if not self.is_hdf5_file_obj:
self.in_file = h5py.File(
self.input_file_name[0]
if isinstance(self.input_file_name, list)
else self.input_file_name, 'r'
)
else:
self.in_file = self.input_file_name

self.full_visit(self.in_file, self.in_file, '', self.visit_node)

if self.d_inq_nd is None and self.c_inq_nd is None:
get_default_plotable(self.in_file, self.logger)
# To log the provided concept and concepts founded
if self.c_inq_nd is not None:
for hdf_path in self.hdf_path_list_for_c_inq_nd:
self.logger.info(hdf_path)
finally:
# To test if hdf_file is open print(self.in_file.id.valid)
self.in_file.close()
# To test if hdf_file is open print(self.in_file.id.valid)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems also not related to the logging

Comment on lines +128 to +147
def get(self, key, return_value=None):
"""Implementing get method for template.
Parameters
----------
key : str
Template key
return_value : Any
return :
The value comes with return_value
"""
val = self.optional.get(key, None)
if val is None:
val = self.recommended.get(key, None)
if val is None:
val = self.required.get(key, None)
if val is None:
val = self.undocumented.get(key, None)
if val is None:
return return_value
return val
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also not related to the logging parts

Comment on lines +660 to +726
def transform_to_intended_dt(str_value: Any) -> Optional[Any]:
"""Transform string to the intended data type, if not then return str_value.

E.g '2.5E-2' will be transfor into 2.5E-2
tested with: '2.4E-23', '28', '45.98', 'test', ['59', '3.00005', '498E-34'],
'23 34 444 5000', None
with result: 2.4e-23, 28, 45.98, test, [5.90000e+01 3.00005e+00 4.98000e-32],
np.array([23 34 444 5000]), None
NOTE: add another arg in this func for giving 'hint' what kind of data like
numpy array or list
Parameters
----------
str_value : str
Data from other format that comes as string e.g. string of list.

Returns
-------
Union[str, int, float, np.ndarray]
Converted data type
"""

symbol_list_for_data_seperation = [';', ' ']
transformed: Any = None

if isinstance(str_value, list):
try:
transformed = np.array(str_value, dtype=np.float64)
return transformed
except ValueError:
pass

elif isinstance(str_value, np.ndarray):
return str_value
elif isinstance(str_value, str):
try:
transformed = int(str_value)
except ValueError:
try:
transformed = float(str_value)
except ValueError:
if '[' in str_value and ']' in str_value:
transformed = json.loads(str_value)
if transformed is not None:
return transformed
for sym in symbol_list_for_data_seperation:
if sym in str_value:
parts = str_value.split(sym)
modified_parts = []
for part in parts:
modified_parts.append(transform_to_intended_dt(part))
return transform_to_intended_dt(modified_parts)

return str_value


def nested_dict_to_slash_separated_path(nested_dict: dict,
flattened_dict: dict,
parent_path=''):
"""Convert nested dict into slash separeted path upto certain level."""
sep = '/'

for key, val in nested_dict.items():
path = parent_path + sep + key
if isinstance(val, dict):
nested_dict_to_slash_separated_path(val, flattened_dict, path)
else:
flattened_dict[path] = val
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be not related to the logging feature.

@domna domna changed the base branch from master to XRD_reader_integretion December 1, 2023 10:34
@RubelMozumder RubelMozumder force-pushed the XRD_reader_integretion branch from 2331052 to 49daf29 Compare December 1, 2023 15:22
@RubelMozumder RubelMozumder changed the base branch from XRD_reader_integretion to master December 3, 2023 11:05
@RubelMozumder
Copy link
Collaborator Author

This PR is replaced by PR #191 .

@RubelMozumder RubelMozumder deleted the mpesworkshop_167 branch June 4, 2024 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allow report of missing fields
4 participants