-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate strange blobby source distribution in some HAP point catalogs #1773
Comments
Comment by Rick White on JIRA: An example image !hst_13057_12_as_wfc_jc3f12-point.png|width=70%! |
Comment by Rick White on JIRA: I have identified multiple additional datasets with this problem. So far I have only seen this issue in ACS/WFC datasets. We do not use the other ACS detectors in the Hubble Source Catalog, but it is likely that if the problem occurred for WFC3/UVIS or WFC3/IR, it would have been discovered. So it is probable that this is an ACS-only problem. (No WFPC2 catalogs have been examined.) I estimate there are at least 125 such datasets in the sample we are using for the HSC. There are a total of about 16,400 ACS/WFC visits in the sample, so approximately 0.8% of the HAP ACS/WFC catalogs probably have the issue. The table lists 16 ACS visits that are known to have bad, blobby point catalogs. The table includes the number of good sources (according to the flags) in the HAP source lists and a link to the HLA interactive display that overlays the point catalog. ||datasetname||segmentCatNobj||pointCatNobj||display|| |
Comment by Michele De La Pena on JIRA: Confirmed this issue still exists even with all the improvements/changes which have been implemented thus far (3.7.1rc5). As stated in this ticket, this looks to be a Point catalog problem as this is not observed in the Segment catalog. However, both algorithms utilize the same background, though the Point algorithm may alter the backround or threshold above which signal could be considered sources. |
Comment by Rick White on JIRA: Michele De La Pena Thanks for the update! I suspect that the segment catalog is not affected by the bad background because it uses the If I'm right, you will see that the segment catalog for the bad field uses the RW2D kernel instead of the |
Comment by Michele De La Pena on JIRA: Rick White This dataset, hst_13057_12, uses a background image as determined from the photutils.background.Background2D algorithm. The median background value is ~1.5 and the median RMS background is ~10.43. This background image is used for both the point and segmentation algorithms. It has been observed that the background image has structure which maps to the blobby regions of source over-detection for the point catalogs. The background image is scaled here to emphasize the different regions. !background_image.png! Why is the blobby source detection seen in the point catalog but not in the segmentation catalog as both algorithms use a background-subtracted image in the detection process? The point catalog used the "psf" starfinder algorithm to detect sources which employs theoretical PSFs to identify point sources. This algorithm is used for all detectors, though there are two other options which can be set in the configuration file: "dao" and "iraf". In the "psf" case !background-subtractedAndScaled.png! !ImageUsedForPointSourceDetection.png! In contrast, the segmentation catalog used the Gaussian filter kernel to detect its segments/sources. In this case: !SegmentationConvolvedImage.png!
|
Comment by Rick White on JIRA: Thanks for all the info, Michele De La Pena! I see that I was wrong about the It seems like regardless of the difference between the I'm looking at the code where I see from the Possibly the |
Comment by Michele De La Pena on JIRA: Rick White I agree the actual problem is in the background determination. We have seen these background rosettes previously and did not care for them very much! In some ways a good background determination seems to be the hardest part of this problem. |
Comment by Michele De La Pena on JIRA: For this type of image, the simple statistics background determination works better. The background2D technique is supposed to accommodate images which have varying level of background signal across the image. Forcing the simple statistics background to be used via a configuration variable, the sources are well-distributed. I have put the point and segmentation ecsv files in my home directory for you on the Linux systems: /home/mdelapena/ForRick. The problem(s) seem to come down to being able to choose the appropriate background determination technique to use, and/or when the background2D is in use, tune the parameters appropriately so we do not get rosettes. This is easily said, harder to do. |
Comment by Rick White on JIRA: Michele De La Pena OK, that is interesting! I looked at these catalogs, and don't see any sign of the blobby distribution. Both the Here are the full-frame images. I used the old version of the So at least for this image, the results look very good now. |
Comment by Michele De La Pena on JIRA: @rlw To clarify, once I changed the background algorithm used, I got the nice, new, well-distributed source candidate results. In fact, I think we agree the results are very good. This was an experiment. The issue comes down to what I did not say very well in my comment on 02 July 2024 at 4:13 pm. There are essentially two background determination techniques available: (1) the so-called sigma-clipped statistics where a single value is determined and used to represent the entire 2D background, or (2) the background2D algorithm from Photutils which is an empirically determined 2D surface. There are two problems which should be addressed: (a) improve the criteria for the code to determine which background technique to use, and (b) better "tune" the Photutils background2D so rosettes are not produced. |
Comment by Rick White on JIRA: Michele De La Pena Do you still have the versions of the catalogs that used the 2D backgrounds? I'd like to see those too to see whether the segment catalog also shows blobs. |
Comment by Michele De La Pena on JIRA: Rick White I have put the catalogs generated with the latest version of Drizzlepac (v3.7.1rc5) where the blobby nature of the point catalog is apparent, but the segmentation catalog does not show any blobs to my eyes! The catalogs are in /home/mdelapena/ForRick/BLOBBY. Just as a reminder the "statistics" catalogs are still in /home/mdelapena/ForRick. |
Comment by Rick White on JIRA: Michele De La Pena Thank you. I confirm that the Can I get the filter ecsv catalogs for both versions too? I'd like to look at the backgrounds and other properties for individual sources, but the But here's something notable: the This makes me think about the impact of I wonder whether the problem might not be the background itself, but could instead be due to the background RMS estimate? Maybe comparing the noise for individual sources will clarify what is going on. |
Comment by Michele De La Pena on JIRA: Rick White I have put StatsCat.tar in /home/mdelapena/ForRick. These are the catalogs based upon the sigma-clipped statistics. I have put BlobbyCat.tar in /home/mdelapena/ForRick/BLOBBY . These are the catalogs associated with the Photutils background2D where the sources are spatially bunched up for the point catalog. |
Comment by Rick White on JIRA: Michele De La Pena I matched the objects in the two For the I looked at all the parameters in the catalogs. It looks like the biggest change is in the This is important because one of the criteria for detecting sources in the Here is a plot showing the areas: The right panel shows the difference in the areas versus the I don't have an explanation for this behavior! It seems like small changes in the sky estimate can create large differences in the apparent areas. Note that these area changes are found across the entire image, not just in the "blobby" regions or the "empty" regions where the background estimate is lower or higher. If we can understand why so few pixels are found above the threshold, maybe we can make some progress toward understanding the effect of |
Comment by Rick White on JIRA: Looking at the current ops
Then it runs the
That seems like a tremendously large change in both the median background (from 0.36 to 1.37) and in the median RMS background (from 0.06 to 9.5!) The huge rms value is probably the reason for the very high detection threshold that gets used in this image. I wonder whether we have any images where the |
Comment by Rick White on JIRA: I have done more analysis on the differences between the I created a grit repository with my code. Briefly, the approach is:
Here is a plot from the notebook that summarizes the results: Most points have similar median sky values (top row), but there are some visits that are outliers. In the bottom row, the rms values are usually smaller for All the bad blobby catalogs in my sample list are in those bad orange points. I have looked at a bunch of images (all the way down to ratios of 1 between the rms values). The large rms values for My proposed change is to reject the The number of visits that will be changed by this is fairly small. Here are the counts for various instruments. The last column || inst || ratio<=1 || ratio>1 || total || |
Comment by Michele De La Pena on JIRA: @rlw Thank you for all your hard work. I will check what you have done now that I am back to work on this ticket. I had to step away for a bit to work other issues. Just so you know, I am probably going to close this ticket and open a new one which is to implement a solution to this problem. I had already changed this ticket to be an {}investigation{}. Of course the new ticket will reference this ticket. |
Comment by Michele De La Pena on JIRA: Closing this ticket as Done as it was meant to be "discovery/investigation". The suggested solutions HLA-1284, HLA-1286, and HLA-1287 will implement solutions or at least mitigating algorithms regarding this issue. |
Issue HLA-1240 was created on JIRA by Rick White:
There is a fairly rare failure mode where the HAP point catalog source detection goes crazy and generates big blobs with thousands of spurious sources scattered around the image. I don't know what triggers this, but my guess is that something has gone wrong with the background estimation. That could lead to this kind of structure.
Not a high priority to fix since it is rare, but if the problem were identified and fixed, it would probably lead to quality improvements in other catalogs that are not so obviously bad.
A sample image is appended below in the attachments and comments. An additional comment lists 16 datasets that are known to have the problem. The problem has only been found in ACS/WFC catalogs. It is likely that at least 125 ACS/WFC visits have the problem (about 0.8% of the ACS/WFC HAP-SVM visits).
The text was updated successfully, but these errors were encountered: