Acumulating errors for groups, fields and attributes. #175

RubelMozumder · 2023-10-31T16:25:09Z

Accumulatting all the error for required fields, groups and attributes, later raise an error.

Fixes #167

RubelMozumder · 2023-10-31T20:27:59Z

@domna here is the primary approach. Did you mean something like this?
With this change, the converter will not stop for individual error but rather collect all required fields, groups, and attributes that have not been supplied. Finally will raise the error.

domna

This works nicely for me already. However, I would just use pythons logging to report on the fields which are missing and just raise a general warning (e.g., Missing required fields), maybe with a custom Error for this (so we can specifically catch this). That way we can also suppress the warnings externally if we want (e.g., when we want to write @partial).

I think it would also be useful to have a flag in pynxtools config to switch failing on or off here in case we just want to write a @partial appdef or we just catch a custom Error. When we add support for writing partials we could add a cli argument for this.

We could also add the @partial resolving here, but I think it is more involved, because we actually have to check which concepts we supply and it might be easier to request this via cli (i.e., we don't say --nxdl NXmy_nxdl but rather --nxdl NXmy_nxdl/NXinstrument,NXmy_nxdl/NXsample)

I think @sherjeelshabih might also have an opinion on this.

domna · 2023-11-01T10:22:10Z

pynxtools/dataconverter/helpers.py

@@ -456,15 +459,28 @@ def ensure_all_required_fields_exist(template, data, nxdl_root):
            opt_parent = check_for_optional_parent(path, nxdl_root)
            if opt_parent != "<<NOT_FOUND>>":
                if does_group_exist(opt_parent, data) and not does_group_exist(renamed_path, data):
-                    raise ValueError(f"The required group, {path}, hasn't been supplied"
-                                     f" while its optional parent, {path}, is supplied.")
+                    missing_groups_and_fields['groups'].appends(path)


I would just use logging here and do a logger.warning.

So, you do not want the converter blocked, even the required fields are not rendered.

This is what I had in mind. No need to collect all these errors like this. We can just output the warnings.

domna · 2023-11-01T10:22:38Z

pynxtools/dataconverter/helpers.py

+            if entry_name.startswith('@'):
+                missing_groups_and_fields['attributes'].append(path)
+            else:
+                missing_groups_and_fields['fields'].append(path)


Also can just be a logger.warning

domna · 2023-11-01T10:23:37Z

pynxtools/dataconverter/helpers.py

+            if path_list:
+                for path in path_list:
+                    error = error + f"    \n\u2022{path}\n"
+        if error:


Suggested change

if error:

if missing_groups_and_fields:

It does not work, because missing_group_and_fields is not an empty dict.

In [1]: dic = {'xx':[]} In [2]: if dic: ...: print('somthing') ...: somthing

RubelMozumder · 2023-11-01T11:44:25Z

This works nicely for me already. However, I would just use pythons logging to report on the fields which are missing and just raise a general warning (e.g., Missing required fields), maybe with a custom Error for this (so we can specifically catch this). That way we can also suppress the warnings externally if we want (e.g., when we want to write @partial).

I think it would also be useful to have a flag in pynxtools config to switch failing on or off here in case we just want to write a @partial appdef or we just catch a custom Error. When we add support for writing partials we could add a cli argument for this.

We could also add the @partial resolving here, but I think it is more involved, because we actually have to check which concepts we supply and it might be easier to request this via cli (i.e., we don't say --nxdl NXmy_nxdl but rather --nxdl NXmy_nxdl/NXinstrument,NXmy_nxdl/NXsample)

I think @sherjeelshabih might also have an opinion on this.

I see that there are two points:

Having logger warning and finally raise a general (with out nofitying any missing fields) error to get the reader caught.
Still I see a pitfall in general error massage, the nomad loger does not catch the logger info from pynxtools. So the a general error massage would not convey any valuable info. One thing, now we can do having logger massage as you mentioned and error massage with the list of missing required field, groups, and attributes. So the error massage will also be meaningful in Noamd. Probably, @sherjeelshabih may have something to include.

Another thing, I want to mention here, we need create a general logger ogject, that can take other logger object and pass the log massages from pynxtools. But this must come through another separate development and PR. @sherjeelshabih what do you think?
The @partial resolution needs more discussion and a refactor lots of code and files.
Because, from the template genration point of view it is fine. But the conflicts come after manipulation of template inside reader. Because, the motive of reader to work with application definition, not with base definition. So before getting any log massage or error reader will block the process. So, this kind of modification needs a different PR with lots of changes. @sherjeelshabih what is you opinion?

sherjeelshabih · 2023-11-01T11:57:27Z

I would say once we have the warning messages, let's not even throw an error. We want to be as permissive as possible now. And in another PR we will automatically add the @partial attribute when we detect something missing. So with those two, I guess we won't need to raise anything.

@RubelMozumder we should definitely check how to get the logger right from Nomad. I thought this was already being done. It will be best to not raise any errors and just let the Nomad GUI display warnings for missing fields. The workflow shouldn't be stopped like that.

domna · 2023-11-01T12:03:38Z

I would say once we have the warning messages, let's not even throw an error. We want to be as permissive as possible now. And in another PR we will automatically add the @partial attribute when we detect something missing. So with those two, I guess we won't need to raise anything.

Did not fully read all the comments yet but just wanted to leave my two cents on this: This is why I suggested using a specific error message for this. That way we could also catch it in nomad and just ignore it, but we could also fail in pynxtools if you want to fail on this (like a non-partial mode, e.g., I want to reliably write a full appdef). But I would be fine with non-failing by default (also that way we can directly throw it into nomad without any modification).

domna · 2023-11-01T15:42:32Z

pynxtools/dataconverter/convert.py

@@ -72,8 +72,12 @@ def convert(input_file: Tuple[str],
            generate_template: bool = False,
            fair: bool = False,
            undocumented: bool = False,
+            logger_: logging.Logger = None,


The best practice for logging is different. The docs recommend to use a modul level logger with a name, e.g., creating logger variable at the start of the file (see https://docs.python.org/3/howto/logging.html#advanced-logging-tutorial):

import logging logger = logging.getLogger(__file__)

Then you can just use logger.warning("My warning") inside the file. Also use %s syntax when formatting the string, e.g., logger.warning("Invalid value: %s", my_var) when logging my_var, because logging does not evaluate my_var in case the logging level is disabled.

This allows another tool using pynxtools to hook into the logger with our name and display the emitted logs. I don't know if we have to follow naming conventions for nomad (I had this for uvicorn where I needed to create a name like logging.getLogger(f'uvicorn.{__file__}') and then everything was displayed automagically).

I will try to follow it.

Yah, when you mentioned about logger warning in this, I also thought to have something like this as you mentioned here nicely, thanks. I see the beauty of logger.

One thing, logger.warning("Invalid value: %s", my_var) does not replace %s when in nomad logger. But the same syntax works in console when calling reader.

Module level logger in pynxtools is also working nicely.

One thing, logger.warning("Invalid value: %s", my_var) does not replace %s when in nomad logger. But the same syntax works in console when calling reader.

Ok, this is odd. Normally, this is recommended (see https://docs.python.org/3/howto/logging.html#optimization). Maybe this behaviour is deactivated for some reason in nomad, because there I just see f-strings being used (and the pylint errors are also deactivated). So we can just follow the nomad way: use f-strings and add logging-not-lazy, logging-format-interpolation, logging-fstring-interpolation to .pylintrc. But I would be interested why it is like this in nomad

RubelMozumder · 2023-11-01T16:30:50Z

So, there is no blocker, pynxtools reader is now passing the missing required concepts with warning.

domna · 2023-11-01T16:56:59Z

So, there is no blocker, pynxtools reader is now passing the missing required concepts with warning.

🎉 Yes, pythons logging just works like magic 🔮

pynxtools/dataconverter/convert.py

from Sheerjeel Co-authored-by: Sherjeel Shabih <[email protected]>

domna

In this PR are a lot of additional features. Please cleanup this PR to only add the features relevant for proper logging of the writing errors.

domna · 2023-12-01T10:04:34Z

pynxtools/dataconverter/readers/xrd/README.md

@@ -0,0 +1,40 @@
+# XRD reader


Why is this part of this PR?
Please remove the xrd part and make it separate PR if necessary.

domna · 2023-12-01T10:05:32Z

dev-requirements.txt

@@ -510,6 +510,7 @@ typing-extensions==4.3.0
    #   astroid
    #   mypy
    #   numcodecs
+    #   pylint


This changes can be reverted, because it is not changing anything.

domna · 2023-12-01T10:06:53Z

pynxtools/dataconverter/convert.py

-logger = logging.getLogger(__name__)  # pylint: disable=C0103
-UNDOCUMENTED = 9
-logger.setLevel(logging.INFO)
-logger.addHandler(logging.StreamHandler(sys.stdout))


Why has the UNDOCUMENTED level been removed?
I introduced to be able to separate info and undocumented loggings.

domna · 2023-12-01T10:08:46Z

pynxtools/dataconverter/logger.py

+
+import logging
+
+logger = logging.getLogger("pynxtools")


Is there any benefit of having a central logger instance?

domna · 2023-12-01T10:09:42Z

pynxtools/dataconverter/helpers.py

@@ -32,7 +33,7 @@
 from pynxtools.nexus import nexus
 from pynxtools.nexus.nexus import NxdlAttributeError

-logger = logging.getLogger(__name__)
+pynx_logger = logging.getLogger(__name__)


This is inconsistent. In other places you use the central logger and here you use a local logger.

That's came from rebase. I will fixed
it

domna · 2023-12-01T10:11:27Z

pynxtools/dataconverter/helpers.py

-def validate_data_dict(template, data, nxdl_root: ET.Element):
+def validate_data_dict(template, data,
+                       nxdl_root: ET.Element,
+                       logger=pynx_logger):


Passing of logging shouldn't also be required here.

domna · 2023-12-01T10:11:42Z

pynxtools/dataconverter/helpers.py

@@ -589,7 +594,7 @@ def convert_to_hill(atoms_typ):
    return atom_list + list(atoms_typ)


-def add_default_root_attributes(data, filename):
+def add_default_root_attributes(data, filename, logger=pynx_logger):


Also not required to be passed here.

domna · 2023-12-01T10:13:22Z

pynxtools/nexus/nexus.py

+        try:
+            if not self.is_hdf5_file_obj:
+                self.in_file = h5py.File(
+                    self.input_file_name[0]
+                    if isinstance(self.input_file_name, list)
+                    else self.input_file_name, 'r'
+                )
+            else:
+                self.in_file = self.input_file_name
+
+            self.full_visit(self.in_file, self.in_file, '', self.visit_node)
+
+            if self.d_inq_nd is None and self.c_inq_nd is None:
+                get_default_plotable(self.in_file, self.logger)
+            # To log the provided concept and concepts founded
+            if self.c_inq_nd is not None:
+                for hdf_path in self.hdf_path_list_for_c_inq_nd:
+                    self.logger.info(hdf_path)
+        finally:
+            # To test if hdf_file is open print(self.in_file.id.valid)
+            self.in_file.close()
+            # To test if hdf_file is open print(self.in_file.id.valid)


This seems also not related to the logging

domna · 2023-12-01T10:13:59Z

pynxtools/dataconverter/template.py

+    def get(self, key, return_value=None):
+        """Implementing get method for template.
+        Parameters
+        ----------
+        key : str
+            Template key
+        return_value : Any
+        return :
+            The value comes with return_value
+        """
+        val = self.optional.get(key, None)
+        if val is None:
+            val = self.recommended.get(key, None)
+            if val is None:
+                val = self.required.get(key, None)
+                if val is None:
+                    val = self.undocumented.get(key, None)
+        if val is None:
+            return return_value
+        return val


This is also not related to the logging parts

domna · 2023-12-01T10:14:23Z

pynxtools/dataconverter/helpers.py

+def transform_to_intended_dt(str_value: Any) -> Optional[Any]:
+    """Transform string to the intended data type, if not then return str_value.
+
+    E.g '2.5E-2' will be transfor into 2.5E-2
+    tested with: '2.4E-23', '28', '45.98', 'test', ['59', '3.00005', '498E-34'],
+                 '23 34 444 5000', None
+    with result: 2.4e-23, 28, 45.98, test, [5.90000e+01 3.00005e+00 4.98000e-32],
+                 np.array([23 34 444 5000]), None
+    NOTE: add another arg in this func for giving 'hint' what kind of data like
+        numpy array or list
+    Parameters
+    ----------
+    str_value : str
+        Data from other format that comes as string e.g. string of list.
+
+    Returns
+    -------
+    Union[str, int, float, np.ndarray]
+        Converted data type
+    """
+
+    symbol_list_for_data_seperation = [';', ' ']
+    transformed: Any = None
+
+    if isinstance(str_value, list):
+        try:
+            transformed = np.array(str_value, dtype=np.float64)
+            return transformed
+        except ValueError:
+            pass
+
+    elif isinstance(str_value, np.ndarray):
+        return str_value
+    elif isinstance(str_value, str):
+        try:
+            transformed = int(str_value)
+        except ValueError:
+            try:
+                transformed = float(str_value)
+            except ValueError:
+                if '[' in str_value and ']' in str_value:
+                    transformed = json.loads(str_value)
+        if transformed is not None:
+            return transformed
+        for sym in symbol_list_for_data_seperation:
+            if sym in str_value:
+                parts = str_value.split(sym)
+                modified_parts = []
+                for part in parts:
+                    modified_parts.append(transform_to_intended_dt(part))
+                return transform_to_intended_dt(modified_parts)
+
+    return str_value
+
+
+def nested_dict_to_slash_separated_path(nested_dict: dict,
+                                        flattened_dict: dict,
+                                        parent_path=''):
+    """Convert nested dict into slash separeted path upto certain level."""
+    sep = '/'
+
+    for key, val in nested_dict.items():
+        path = parent_path + sep + key
+        if isinstance(val, dict):
+            nested_dict_to_slash_separated_path(val, flattened_dict, path)
+        else:
+            flattened_dict[path] = val


This seems to be not related to the logging feature.

RubelMozumder · 2023-12-03T15:00:06Z

This PR is replaced by PR #191 .

RubelMozumder mentioned this pull request Nov 1, 2023

Allow report of missing fields #167

Open

RubelMozumder requested review from sherjeelshabih and domna November 1, 2023 08:10

domna reviewed Nov 1, 2023

View reviewed changes

RubelMozumder force-pushed the mpesworkshop_167 branch from 6987406 to 2e4c667 Compare November 1, 2023 16:26

RubelMozumder marked this pull request as ready for review November 7, 2023 16:49

sherjeelshabih reviewed Nov 7, 2023

View reviewed changes

pynxtools/dataconverter/convert.py Outdated Show resolved Hide resolved

domna mentioned this pull request Nov 21, 2023

Merging of multiple files #176

Merged

budschi and others added 15 commits November 29, 2023 21:20

added xrd_parser.py from Pepe's plugin

2f0d0b3

General skelaton for xrd reader.

fab79b0

rb

1db4efc

RB

69cd15e

RB

cb95dc6

Adding fixing ci.

3b35159

removing x-ray utils in from xrd reader.

4184ccb

removing x-ray utils in from xrd reader.

f65813e

adding Sherejeel PR review.

0c5f9b2

Removing some code.

409572f

Update pynxtools/dataconverter/template.py

4a74eff

from Sheerjeel Co-authored-by: Sherjeel Shabih <[email protected]>

Adds get function in Template to proxy __getitem__

4f5266c

update.

061490e

moving to the old code.

2c6e0c3

removing get_file_obj()

68cdca7

RubelMozumder added 20 commits November 29, 2023 21:22

ci

b17bd9e

Adding test.

3c394d6

adding test for tranform_to_intended_data.

e813a22

remove sts.

f85d6d9

updating nxdl_path to nxdl_f_path.

087541f

CI

02982d6

fix unit.

c76305f

update

7302e02

update readme file.

7581787

Acumulating errors for groups, fields and attributes.

39057ed

Relocation error massage statements.

47c7b1f

RE conf-1

0cdc1c4

RE conf-2

8e98d55

RE conf-3

958385b

RE conf-4

381e88c

RE conf-5

41fa6f1

RE conf-6

9f43374

split test into warning and error.

504135d

RE conf-7

f47c8c5

ci/cd

ead1b4b

RubelMozumder force-pushed the mpesworkshop_167 branch from ce5b4af to ead1b4b Compare December 1, 2023 08:27

domna requested changes Dec 1, 2023

View reviewed changes

domna changed the base branch from master to XRD_reader_integretion December 1, 2023 10:34

RubelMozumder force-pushed the XRD_reader_integretion branch from 2331052 to 49daf29 Compare December 1, 2023 15:22

RubelMozumder changed the base branch from XRD_reader_integretion to master December 3, 2023 11:05

RubelMozumder mentioned this pull request Dec 3, 2023

Adding logger in global level. #191

Merged

RubelMozumder closed this Dec 3, 2023

RubelMozumder deleted the mpesworkshop_167 branch June 4, 2024 09:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Acumulating errors for groups, fields and attributes. #175

Acumulating errors for groups, fields and attributes. #175

RubelMozumder commented Oct 31, 2023 •

edited by domna

Loading

RubelMozumder commented Oct 31, 2023

domna left a comment

domna Nov 1, 2023

RubelMozumder Nov 1, 2023

sherjeelshabih Nov 1, 2023

domna Nov 1, 2023

RubelMozumder Nov 1, 2023

domna Nov 1, 2023

RubelMozumder Nov 1, 2023

RubelMozumder commented Nov 1, 2023

sherjeelshabih commented Nov 1, 2023

domna commented Nov 1, 2023

domna Nov 1, 2023

RubelMozumder Nov 1, 2023

RubelMozumder Nov 1, 2023 •

edited

Loading

RubelMozumder Nov 1, 2023

RubelMozumder Nov 2, 2023

domna Nov 2, 2023

RubelMozumder commented Nov 1, 2023

domna commented Nov 1, 2023 •

edited

Loading

domna left a comment

domna Dec 1, 2023

domna Dec 1, 2023

domna Dec 1, 2023

domna Dec 1, 2023

domna Dec 1, 2023

RubelMozumder Dec 1, 2023

domna Dec 1, 2023

domna Dec 1, 2023

domna Dec 1, 2023

domna Dec 1, 2023

domna Dec 1, 2023

RubelMozumder commented Dec 3, 2023

Acumulating errors for groups, fields and attributes. #175

Acumulating errors for groups, fields and attributes. #175

Conversation

RubelMozumder commented Oct 31, 2023 • edited by domna Loading

RubelMozumder commented Oct 31, 2023

domna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RubelMozumder commented Nov 1, 2023

sherjeelshabih commented Nov 1, 2023

domna commented Nov 1, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RubelMozumder Nov 1, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RubelMozumder commented Nov 1, 2023

domna commented Nov 1, 2023 • edited Loading

domna left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RubelMozumder commented Dec 3, 2023

RubelMozumder commented Oct 31, 2023 •

edited by domna

Loading

RubelMozumder Nov 1, 2023 •

edited

Loading

domna commented Nov 1, 2023 •

edited

Loading