Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-46599: Stop using Butler regular expression searches #239

Merged
merged 1 commit into from
Oct 15, 2024

Conversation

dhirving
Copy link
Contributor

@dhirving dhirving commented Oct 14, 2024

Using regular expressions to search the Butler for collection names is deprecated in RFC-1040.


  • Do unit tests pass (scons and/or stack-os-matrix)?
  • Did you run ap_verify.py on at least one of the standard datasets?
    For changes to metrics, the print_metricvalues script from lsst.verify will be useful.
  • Is the Sphinx documentation up-to-date?

@dhirving dhirving force-pushed the tickets/DM-46599 branch 2 times, most recently from 0a4f449 to 247b2e0 Compare October 15, 2024 00:04
Using regular expressions to search the Butler for collection names is
deprecated in RFC-1040.
@dhirving dhirving marked this pull request as ready for review October 15, 2024 18:37
@dhirving
Copy link
Contributor Author

This passed jenkins but I'm not set up to run ap_verify.py myself -- would you mind running it for me?

@dhirving dhirving requested a review from kfindeisen October 15, 2024 18:44
@@ -277,7 +277,10 @@ def _getCollectionArguments(workspace, reuse):
# skip-existing-in would work around that, but would lead to a worse bug in
# the case that the user is alternating runs with and without --clean-run.
# registry.refresh()
oldRuns = list(registry.queryCollections(re.compile(workspace.outputName + r"/\d+T\d+Z")))
collectionPattern = re.compile(workspace.outputName + r"/\d+T\d+Z")
oldRuns = list(registry.queryCollections(workspace.outputName + "/*"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comment but does it make things faster or slower if we use a more constrained glob such as /????????T*Z or /*T*Z?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all happening in a small, temporary repo, so I don't think performance is an issue.

Copy link
Member

@kfindeisen kfindeisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just one policy question.

oldRuns = list(registry.queryCollections(re.compile(workspace.outputName + r"/\d+T\d+Z")))
collectionPattern = re.compile(workspace.outputName + r"/\d+T\d+Z")
oldRuns = list(registry.queryCollections(workspace.outputName + "/*"))
oldRuns = [run for run in oldRuns if collectionPattern.fullmatch(run)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure I understand this -- regexes are being phased out, but globs are not?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct -- globs will continue to work.

@@ -219,10 +218,31 @@ def _ensureCollection(self, registry, name, collectionType):
The type of collection to add. This field is ignored when
testing if a collection exists.
"""
matchingCollections = list(registry.queryCollections(re.compile(name)))
if not matchingCollections:
if not self._doesCollectionExist(registry, name):
registry.registerCollection(name, type=collectionType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that registerCollection is now(?) idempotent, I wonder if this code could be removed entirely...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one can. The one below is less clear since it avoids modifying the chain if the collection already existed.

I'm gonna leave it alone to avoid needing to re-test, though.

@kfindeisen
Copy link
Member

I've run ap_verify on ap_verify_ci_dc2, no problems. I ran it twice on the same workspace to ensure this code actually gets exercised.

For future reference, stack-os-matrix now has a ci_ap target, though I'm not sure exactly which pipelines it runs.

@dhirving dhirving merged commit 15331d9 into main Oct 15, 2024
2 checks passed
@dhirving dhirving deleted the tickets/DM-46599 branch October 15, 2024 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants