DM-46599: Stop using Butler regular expression searches #239

dhirving · 2024-10-14T20:33:11Z

Using regular expressions to search the Butler for collection names is deprecated in RFC-1040.

Do unit tests pass (scons and/or stack-os-matrix)?
Did you run ap_verify.py on at least one of the standard datasets?
For changes to metrics, the print_metricvalues script from lsst.verify will be useful.
Is the Sphinx documentation up-to-date?

Using regular expressions to search the Butler for collection names is deprecated in RFC-1040.

dhirving · 2024-10-15T18:39:13Z

This passed jenkins but I'm not set up to run ap_verify.py myself -- would you mind running it for me?

timj · 2024-10-15T18:51:20Z

python/lsst/ap/verify/pipeline_driver.py

@@ -277,7 +277,10 @@ def _getCollectionArguments(workspace, reuse):
    # skip-existing-in would work around that, but would lead to a worse bug in
    # the case that the user is alternating runs with and without --clean-run.
    # registry.refresh()
-    oldRuns = list(registry.queryCollections(re.compile(workspace.outputName + r"/\d+T\d+Z")))
+    collectionPattern = re.compile(workspace.outputName + r"/\d+T\d+Z")
+    oldRuns = list(registry.queryCollections(workspace.outputName + "/*"))


Minor comment but does it make things faster or slower if we use a more constrained glob such as /????????T*Z or /*T*Z?

This is all happening in a small, temporary repo, so I don't think performance is an issue.

kfindeisen

Looks good, just one policy question.

kfindeisen · 2024-10-15T18:46:07Z

python/lsst/ap/verify/pipeline_driver.py

-    oldRuns = list(registry.queryCollections(re.compile(workspace.outputName + r"/\d+T\d+Z")))
+    collectionPattern = re.compile(workspace.outputName + r"/\d+T\d+Z")
+    oldRuns = list(registry.queryCollections(workspace.outputName + "/*"))
+    oldRuns = [run for run in oldRuns if collectionPattern.fullmatch(run)]


Just to be sure I understand this -- regexes are being phased out, but globs are not?

Correct -- globs will continue to work.

kfindeisen · 2024-10-15T18:48:50Z

python/lsst/ap/verify/workspace.py

@@ -219,10 +218,31 @@ def _ensureCollection(self, registry, name, collectionType):
            The type of collection to add. This field is ignored when
            testing if a collection exists.
        """
-        matchingCollections = list(registry.queryCollections(re.compile(name)))
-        if not matchingCollections:
+        if not self._doesCollectionExist(registry, name):
            registry.registerCollection(name, type=collectionType)


Given that registerCollection is now(?) idempotent, I wonder if this code could be removed entirely...

I think this one can. The one below is less clear since it avoids modifying the chain if the collection already existed.

I'm gonna leave it alone to avoid needing to re-test, though.

kfindeisen · 2024-10-15T19:05:38Z

I've run ap_verify on ap_verify_ci_dc2, no problems. I ran it twice on the same workspace to ensure this code actually gets exercised.

For future reference, stack-os-matrix now has a ci_ap target, though I'm not sure exactly which pipelines it runs.

dhirving force-pushed the tickets/DM-46599 branch 2 times, most recently from 0a4f449 to 247b2e0 Compare October 15, 2024 00:04

Stop using Butler regular expression searches

4f84c7d

Using regular expressions to search the Butler for collection names is deprecated in RFC-1040.

dhirving force-pushed the tickets/DM-46599 branch from 247b2e0 to 4f84c7d Compare October 15, 2024 00:08

dhirving marked this pull request as ready for review October 15, 2024 18:37

dhirving requested a review from kfindeisen October 15, 2024 18:44

timj reviewed Oct 15, 2024

View reviewed changes

kfindeisen approved these changes Oct 15, 2024

View reviewed changes

dhirving merged commit 15331d9 into main Oct 15, 2024
2 checks passed

dhirving deleted the tickets/DM-46599 branch October 15, 2024 22:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DM-46599: Stop using Butler regular expression searches #239

DM-46599: Stop using Butler regular expression searches #239

dhirving commented Oct 14, 2024 •

edited by kfindeisen

Loading

dhirving commented Oct 15, 2024

timj Oct 15, 2024

kfindeisen Oct 15, 2024

kfindeisen left a comment

kfindeisen Oct 15, 2024

dhirving Oct 15, 2024

kfindeisen Oct 15, 2024

dhirving Oct 15, 2024

kfindeisen commented Oct 15, 2024

DM-46599: Stop using Butler regular expression searches #239

DM-46599: Stop using Butler regular expression searches #239

Conversation

dhirving commented Oct 14, 2024 • edited by kfindeisen Loading

dhirving commented Oct 15, 2024

timj Oct 15, 2024

Choose a reason for hiding this comment

kfindeisen Oct 15, 2024

Choose a reason for hiding this comment

kfindeisen left a comment

Choose a reason for hiding this comment

kfindeisen Oct 15, 2024

Choose a reason for hiding this comment

dhirving Oct 15, 2024

Choose a reason for hiding this comment

kfindeisen Oct 15, 2024

Choose a reason for hiding this comment

dhirving Oct 15, 2024

Choose a reason for hiding this comment

kfindeisen commented Oct 15, 2024

dhirving commented Oct 14, 2024 •

edited by kfindeisen

Loading