Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New tooling for comparing the keyword dictionary and data model schemas #337

Merged
merged 29 commits into from
Oct 31, 2024

Conversation

braingram
Copy link
Collaborator

@braingram braingram commented Oct 17, 2024

https://jira.stsci.edu/browse/JP-3783

As it currently stands this PR:

  • adds a new (private) submodule stdatamodels.jwst._kwtool
  • with a rudimentary CLI python -m stdatamodels.jwst._kwtool <keyword dictionary path>
  • that generates a report html file (example attached to https://jira.stsci.edu/browse/JP-3783)

Link to added docs: https://stdatamodels--337.org.readthedocs.build/en/337/jwst/kwtool/index.html

It would be helpful to have feedback on:

  • What comparisons are useful?
  • What format(s) are most helpful?

Some known issues are:

  • "enum" comparison will need to account for the datamodel schemas allowing now-unused values. These are currently reported as differences but being able to specify an allowed difference (which still checking the other values in the enum) would be useful to de-clutter the report.
  • "path" differences are reported for keywords that are re-used in multi-model models. For example EXP_TYPE currently reports a "path" difference with "meta.exposure.type" in "kwd" and multiple paths in "dwd" including the "meta.exposure.type" but also "trace.items.meta.exposure.type" (as one example). This is coming from referenced here (nested in meta.items). At first glance I would say this is a bug as I don't see how having every item in the list overwrite the PRIMARY EXP_TYPE is useful. However this might be a "bug" in the comparison where we don't care about this difference and we'll want some way to ignore it or not flag it as an issue in the first place. @tapastro does this usage look like a schema bug? If so I can open a separate issue.

Tasks

  • update or add relevant tests
  • update relevant docstrings and / or docs/ page
  • Does this PR change any API used downstream? (if not, label with no-changelog-entry-needed)
    • write news fragment(s) in changes/: echo "changed something" > changes/<PR#>.<changetype>.rst (see below for change types)
    • run jwst regression tests with this branch installed ("git+https://github.com/<fork>/stdatamodels@<branch>")
news fragment change types...
  • changes/<PR#>.feature.rst: new feature
  • changes/<PR#>.bugfix.rst: fixes an issue
  • changes/<PR#>.doc.rst: documentation change
  • changes/<PR#>.removal.rst: deprecation or removal of public API
  • changes/<PR#>.misc.rst: infrastructure or miscellaneous change

Copy link

codecov bot commented Oct 17, 2024

Codecov Report

Attention: Patch coverage is 97.57174% with 11 lines in your changes missing coverage. Please review.

Project coverage is 68.93%. Comparing base (fd6be8d) to head (616a09b).
Report is 8 commits behind head on main.

Files with missing lines Patch % Lines
src/stdatamodels/jwst/_kwtool/compare.py 97.10% 4 Missing ⚠️
src/stdatamodels/jwst/_kwtool/kwd.py 93.61% 3 Missing ⚠️
src/stdatamodels/jwst/_kwtool/__main__.py 0.00% 2 Missing ⚠️
...atamodels/jwst/_kwtool/_tests/test_against_mast.py 97.36% 1 Missing ⚠️
src/stdatamodels/jwst/_kwtool/_tests/test_dmd.py 93.33% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #337      +/-   ##
==========================================
+ Coverage   66.55%   68.93%   +2.38%     
==========================================
  Files         102      114      +12     
  Lines        5456     5910     +454     
==========================================
+ Hits         3631     4074     +443     
- Misses       1825     1836      +11     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.


The "keyword dictionary" is a collection
of json files that describe FITS keywords used in JWST files.
Although the format is similar to jsonschema is is not compatible

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"is is" should be "it is"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I applied that fix in b94d57d

@braingram
Copy link
Collaborator Author

braingram commented Oct 29, 2024

Based on discussion with @tapastro there are a few remaining issues to sort out with this PR prior to approval (based on discussion of the currently generated report).

  • ignore "path" differences for datamodel schema keyword definitions nested under "items" arrays (like in "HDU: EXTRACT1D KEYWORD: EXTR_X"). These don't provide a mapping that would be useful for triggering reprocessing so even if they are listed in the archive they don't allow a one-to-one mapping of value-to-file. The tool will now ignore path differences if "items" is in the path. Fixed in 66092eb
  • allow some enum differences (so datamodels can read old files that use outdated values) 09ef375
  • don't require datamodel keyword definitions to define T, F enum for bools. Fixed in b99a215
  • (I added this one) sort printed sets to ensure consistent ordering 04dcbbc

@braingram
Copy link
Collaborator Author

braingram commented Oct 30, 2024

@tapastro I think this is ready for final review. See the comment above #337 (comment) for a checklist and linked commits for the items we discussed.

I also added a test that fetches the latest "published" keyword dictionary from MAST (using an "unofficial" service since there is no official one) and parses it generating a report as a test of the new tool. I think this is worth expanding by adding a CI job that:

  • checks out stdatamodels main, generates a report
  • checks out the PR branch, generates a report
  • runs diff on the report (and updates both as artifacts)

This way we will know when PRs impact keyword dictionary differences (relative the published version, not the latest dev version which is not public). However it makes sense to me to split this work into a separate PR as I'd like to get this one in to also start working on PRs to address the many differences.

Copy link
Collaborator

@tapastro tapastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

docs/source/jwst/kwtool/keyword_dictionary.rst Outdated Show resolved Hide resolved
docs/source/jwst/kwtool/keyword_dictionary.rst Outdated Show resolved Hide resolved
@braingram braingram enabled auto-merge (squash) October 31, 2024 17:54
@braingram braingram merged commit 75df5f4 into spacetelescope:main Oct 31, 2024
21 checks passed
@braingram braingram deleted the kwtool branch October 31, 2024 18:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants