Skip to content
This repository has been archived by the owner on Nov 7, 2024. It is now read-only.

ECDC-3518-only-partial-depends-on-support #1036

Merged
merged 33 commits into from
Sep 18, 2023

Conversation

ggoneiESS
Copy link
Member

Issue

Closes ECDC-3518 https://jira.esss.lu.se/browse/ECDC-3518

Description of work

Added logic to manage relevant requests. After discussing with Greg. who raised the ticket, about whether this could be done via looking for 'NXTransformations' it was checked and confirmed that this is what is done in the relevant class. It was tested manually against the specific cases which created the ticket initially.

Automated testing was created in test_load_from_json.py. Initially this was done against JSON supplied in this file, but it was invalid for other reasons. After discovering this and confirming it wasn't used elsewehre in the code this JSON section was deleted.

Various implicit casts were made strict, without adding a str method to the classes, in advance of other work on code cleanup

Acceptance Criteria

Enabled relative paths for 'depends_on'

UI tests

N/A

Nominate for Group Code Review

  • Nominate for code review

@ggoneiESS ggoneiESS self-assigned this Aug 14, 2023
@ggoneiESS ggoneiESS requested a review from danesss August 17, 2023 11:38
nexus_constructor/model/module.py Outdated Show resolved Hide resolved
nexus_constructor/json/load_from_json.py Outdated Show resolved Hide resolved
nexus_constructor/json/load_from_json.py Outdated Show resolved Hide resolved
Comment on lines 388 to 389
if len(depends_on.split("/")) == 2:
depends_on = depends_on.split('/')[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that we make special treatment of strings with a single slash.

The standard is a bit ambiguous but I read it as if relative paths could be either:

  • name
  • dir/name or dir/dir/name or dir/dir/dir/name

So I have two questions:

  1. Why do we need take the second element depends_on = depends_on.split('/')[1] instead of simply appending current_path + relative_path
  2. Why can we rely on a single slash being present, instead of an arbitrarily large relative path?

nexus_constructor/json/transformation_reader.py Outdated Show resolved Hide resolved
nexus_constructor/model/component.py Outdated Show resolved Hide resolved
tests/json/test_load_from_json.py Outdated Show resolved Hide resolved
tests/json/test_load_from_json.py Outdated Show resolved Hide resolved
tests/json/test_load_from_json.py Show resolved Hide resolved
@ggoneiESS ggoneiESS marked this pull request as draft August 30, 2023 13:49
@ggoneiESS ggoneiESS requested a review from danesss September 12, 2023 15:49
Copy link
Contributor

@danesss danesss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may have a couple of issues with the current approach.

First, it may not be generic enough, as it makes assumptions on the length and naming of NeXus paths.

Consider an example:
/entry/instrument/detector depends_on ./mytrans/t9
/entry/instrument/detector/mytrans/t9 depends_on ./t8
/entry/instrument/detector/mytrans/t8 depends_on ./t7
...
/entry/instrument/detector/mytrans/t1 depends_on /entry/instrument/stage4

The example shows how a relative depends_on can point to a path at the same level, not under transformations.
It also shows that /transformations is not part of the NeXus standard, so assuming that datasets are named that way is not ideal.

A different example could be found to show a dependency that is nested deeper in the hierarchy, say ./my_stage/something_else/transformations/t1. I believe this is not supported by the current solution.

Another issue is the lack of tests.
I can think of a number of test scenarios not covered. We should cover most of the following:

  • Test depending on the type of group.
    • Some depends_on are datasets (e.g. NXdetector, NXsource, etc)
    • Some depends_on are attributes (e.g. the datasets inside NXtransformations)
  • Define and test the behaviour on both success and fail scenarios. Does the target exists or not, etc.
  • Test variations based on the value inside depends_on
    • Relative paths: at the same level, arbitrarily nested, etc...
    • Absolute paths
    • End of chain (.)
    • Empty of malformed value. What is the desired behaviour?

nexus_constructor/json/load_from_json.py Show resolved Hide resolved
nexus_constructor/json/load_from_json.py Show resolved Hide resolved
nexus_constructor/json/load_from_json.py Outdated Show resolved Hide resolved
nexus_constructor/json/load_from_json.py Outdated Show resolved Hide resolved
nexus_constructor/json/load_from_json.py Outdated Show resolved Hide resolved
nexus_constructor/json/transformation_reader.py Outdated Show resolved Hide resolved
tests/json/test_transformation_reader.py Show resolved Hide resolved
tests/json/test_load_from_json.py Outdated Show resolved Hide resolved
tests/json/test_load_from_json.py Show resolved Hide resolved
@ggoneiESS
Copy link
Member Author

So, depends_on should always be in an NXTransformations group, but not always - see e.g. https://manual.nexusformat.org/classes/base_classes/NXmirror.html (or any other object that uses depends_on). This also seems to suggest that there is no 'relativeness' allowed; it's either absolute, or from that path, i.e. no .. . In one example from the person who raised the ticket, https://jira.esss.lu.se/secure/attachment/182581/one_slit_relative.json you can see both examples that you have for t7, t8, t9, t1. T1 is covered without further work being absolute. t7 is covered by this PR (since we only care for the constructor what the component is that it depends on (dependency_transformation_name, [-1]) and the component itself (dependency_component_name, [-3]) and that's what we fix here.

Looking now it seems that t7 and t8 don't work here; the check for == 2 should actually be <=2, so we always have a 'fake' element in [-2] in the path.

It is true that I always a transformations group - this is valid for the Constructor, as the get_component_and_transform_name method states this (at the end). The original bug was raised because this method assumed a path that has at least three parts and we only care about the antepenultimate and the final parts.

Deeper nested transformations are a bit harder to include, since it would require a new treatment of the transformation reader, which requires other work from other tickets. It could be rewritten to do what you suggest here, but it is outside of the scope of this request, and it's non-trivial. You could have a nested depends_on in the very first group of the file that points to the very last group in the file. The tests should indeed process the file and then check that the depends_on path in the JSON exists in the JSON structure that has been read in.

Right now the Constructor is not in a state to support non-Constructor JSON. This could be an idea later, but would need a large rewrite of Constructor logic.

def get_component_and_transform_name(depends_on_string: str):
    # The following extraction of the component name and transformation name makes the assumption
    # that the transformation lives in a component and nowhere else in the file, this is safe assuming
    # the JSON was created by the NeXus Constructor
    depends_on_path = depends_on_string.split("/")
    dependency_component_name = depends_on_path[-3]
    # [-2] is the NXtransformations group (which we don't need)
    dependency_transformation_name = depends_on_path[-1]
    return dependency_component_name, dependency_transformation_name

@danesss
Copy link
Contributor

danesss commented Sep 13, 2023

I think its a little complicated to discuss these many things at the same time 😅 so let me list requirements for the change first:

  1. I'm not sure if NeXus supports depends_on with targets that are not transformations but other type of things. Maybe we don't need to support that for now.
  2. However, depends_on should NOT always be in an NXTransformations group. Components like detectors or sources have depends_on datasets. Transformations inside NXtransformations also have depends_on attributes. AFAIK both of these are used extensively by us and therefore required.
    (Maybe transformations are always inside NXtransformation like get_component_and_transform_name says, but where depends_on lives is not equivalent to that.)
  3. We currently support absolute paths. We now want to add relative paths. We must be able to support paths like some_other_transform and transformations/the_last_transform_of_this_group, because those are the typical depends_on we use all the time.
  4. I hear that arbitrarily deep relative paths may require further changes, so perhaps we don't support them for now. The constructor should probably detect this condition and throw an error to the user when the file is loaded or the too-deep path is entered in the UI.
  5. We do not need to support relative paths like ../ because this is not specified in the NeXus standard.

I may not have the full background on this, but it seems to me that get_component_and_transform_name requires a very specific format supplied to it (i.e. /some/path/here/COMPONENT_NAME/some_ignored_name/TRANSFORM_NAME).
IMHO if that doesn't match what the depends_on we use look like, then maybe we should fix get_component_and_transform_name or not use it.

But I guess we can just use it as is if, given a relative path, we build its absolute path and pass it on. The current solution of splitting and appending is very ad-hoc and unlikely to cover the different use cases we need.

On another note, maybe we would benefit from using PurePosixPath from https://docs.python.org/3/library/pathlib.html#pure-paths instead of handling paths as strings. We can then benefit from methods like is_absolute(), is_relative_to(Path) and so on.

If I got the requirements or the code wrong just let me know.
When we agree on the requirements we should build tests for each of them and then it's easier to follow the code and discuss if it fulfills the requirements.

@ggoneiESS
Copy link
Member Author

The scope of this branch is to accept paths 'like/this' and 'this', as well as '/absolute/paths/that/are/generated/by/the/Constructor.

This branch adds the two relative path cases, and explicitly tells the user if there is another relative path (that is unsupported). A test, test_GIVEN_json_with_component_depending_on_relative_transform_WHEN_loaded_THEN_model_updated, checks that this is working by asserting a component with the relevant component and transform.

Generally the Constructor should support files created by itself, which uses absolute paths. However, there are other files which should be compatible that are created by other ESS (ECDC) tools. Right now it is out of scope, and non-trivial, to add all possible NeXus depends_on paths to the Constructor.

There should be a move away from using strings towards PurePath, but since the transformation logic will have to be rewritten anyway to implement better handling, and considering the non-urgent need for this rewrite, I think it can wait.

This branch was raised by a bug, which is fixed by the branch. I think the better handling of depends_on should be put into a new ticket as a medium-priority task which can be tackled at a later date, potentially during a larger Constructor rewrite.

@ggoneiESS ggoneiESS marked this pull request as ready for review September 15, 2023 09:33
@ggoneiESS
Copy link
Member Author

ggoneiESS commented Sep 15, 2023

Relative nested depends_on that could exist in e.g. nmx attached nmx_json.zip or @ https://project.esss.dk/owncloud/index.php/apps/files/?dir=/NeXus%20baseline%20templates&fileid=16400156 -> nmx_v3.1_with_mock_detectors.json fails, because the paths are too long (>2 parts). An attempt to only use 2 or 1 parts of the file path fails to reconstruct the path as it uses the parent component, which isn't correct. However, this attempt is caught in a KeyError exception and handled smoothly by the program which attempts to create a model anyway.

It is known that some relative depends_on paths are not allowed, and there are now sufficient checks in the program, and known issues. The specific bug is fixed and allows for more types of paths to be used. Other bugs are of higher priority and should be prioritised but the handling of more general paths, in line with the NeXus standard, in the JSON files in general should be approached in another ticket.

I do welcome any further thoughts on the code, warnings, or tests though!

Copy link
Contributor

@danesss danesss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are a couple of improvements that we could implement in the tests. See the three comments below.

tests/json/test_load_from_json.py Show resolved Hide resolved
tests/json/test_load_from_json.py Show resolved Hide resolved
tests/json/test_load_from_json.py Show resolved Hide resolved
@ggoneiESS ggoneiESS merged commit dd9be10 into main Sep 18, 2023
6 checks passed
@ggoneiESS ggoneiESS deleted the ECDC-3518-only-partial-depends-on-support branch September 18, 2023 11:00
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants