-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-46599: Stop using Butler regular expression searches #239
Conversation
0a4f449
to
247b2e0
Compare
Using regular expressions to search the Butler for collection names is deprecated in RFC-1040.
247b2e0
to
4f84c7d
Compare
This passed jenkins but I'm not set up to run |
@@ -277,7 +277,10 @@ def _getCollectionArguments(workspace, reuse): | |||
# skip-existing-in would work around that, but would lead to a worse bug in | |||
# the case that the user is alternating runs with and without --clean-run. | |||
# registry.refresh() | |||
oldRuns = list(registry.queryCollections(re.compile(workspace.outputName + r"/\d+T\d+Z"))) | |||
collectionPattern = re.compile(workspace.outputName + r"/\d+T\d+Z") | |||
oldRuns = list(registry.queryCollections(workspace.outputName + "/*")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comment but does it make things faster or slower if we use a more constrained glob such as /????????T*Z
or /*T*Z
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is all happening in a small, temporary repo, so I don't think performance is an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just one policy question.
oldRuns = list(registry.queryCollections(re.compile(workspace.outputName + r"/\d+T\d+Z"))) | ||
collectionPattern = re.compile(workspace.outputName + r"/\d+T\d+Z") | ||
oldRuns = list(registry.queryCollections(workspace.outputName + "/*")) | ||
oldRuns = [run for run in oldRuns if collectionPattern.fullmatch(run)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be sure I understand this -- regexes are being phased out, but globs are not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct -- globs will continue to work.
@@ -219,10 +218,31 @@ def _ensureCollection(self, registry, name, collectionType): | |||
The type of collection to add. This field is ignored when | |||
testing if a collection exists. | |||
""" | |||
matchingCollections = list(registry.queryCollections(re.compile(name))) | |||
if not matchingCollections: | |||
if not self._doesCollectionExist(registry, name): | |||
registry.registerCollection(name, type=collectionType) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that registerCollection
is now(?) idempotent, I wonder if this code could be removed entirely...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this one can. The one below is less clear since it avoids modifying the chain if the collection already existed.
I'm gonna leave it alone to avoid needing to re-test, though.
I've run For future reference, |
Using regular expressions to search the Butler for collection names is deprecated in RFC-1040.
scons
and/orstack-os-matrix
)?ap_verify.py
on at least one of the standard datasets?For changes to metrics, the
print_metricvalues
script fromlsst.verify
will be useful.