Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Photutils tutorial notebook #2: source detection #3

Closed

Conversation

laurenmarietta
Copy link
Contributor

The second notebook tutorial for photutils, demonstrating methods for source detection.

Some preliminary concerns I have about this notebook:

  • I'm not sure that my rationalization for why IRAFStarFinder and DAOStarFinder have such different results (because IRAF doesn't allow for elliptical Gaussians) is correct, since I have only the most superficial understanding of how those algorithms work. I would appreciate some feedback from someone who understands them well.
  • Is it too long? If so, I feel as though the image segmentation notebook could potentially be a separate notebook completely.

@eteq I know you mentioned that Tom might be reviewing these notebooks now - if you know his GitHub account and want to tag/assign him, please do!

@laurenmarietta laurenmarietta changed the title Photutils tutorial notebook: source detection Photutils tutorial notebook #2: source detection Sep 19, 2018
@eteq
Copy link
Owner

eteq commented Sep 19, 2018

Thanks @laurenmarietta - Tom is @Onoddil, so definitely makes sense for him to have a look at this as a "user". I'll also plan to take a look at it from my perspective in more detail later.

On the specific concerns:

I'm not sure that my rationalization for why IRAFStarFinder and DAOStarFinder have such different results

I agree this is rather odd. I would have expected <5% difference, not ~90%! I'll try to have a look/ask around some other experts and see if we're missing something.

Is it too long?

I think the length is fine. I agree the segmentation seems a bit out-of-context, but I think it might be even worse to have a separate one for segmentation... An interesting experiment to tie it together might be to do a comparison of the segmentation and other methods? My prediction is that it'll work better on galaxies and worse for stars... but you never know!

@eteq
Copy link
Owner

eteq commented Sep 19, 2018

@laurenmarietta - Ah, I think I have part of the answer. If you look at the IRAFStarFinder and DAOStarFinder you'll see their default settings for the various statistics like sharplo/sharphi and roundlo/roundhi are different. I tried setting them to the same, and then they're a lot closer (although still not as much the same as I would have thought... but within ~30%). So maybe try that as a starting point?

Two other things to just mention that I noticed on the way:

  1. Your current FWHM is wide enough that it's actually missing stars, I think. If you zoom in on a star you'll see that it's only maybe ~3 pixels wide. Since these algorithms are optimized for stars anyway you might lower the FWHM (and the threshold) to get more stars and fewer galaxies? That might also resolve the IRAF/DAO difference since stars are more circular than galaxies anyway

  2. You might find it useful to show cutouts of individual objects - the following worked well for me so feel free to incorporate it if you think it's useful:

#DAO
fig, axs = plt.subplots(3,3)

cutout_sz = 20

srcs = np.random.permutation(sources_dao)[:axs.size]
for ax, src in zip(axs.ravel(), srcs):
    slc = (slice(int(src['ycentroid']-cutout_sz), int(src['ycentroid']+cutout_sz)),
           slice(int(src['xcentroid']-cutout_sz), int(src['xcentroid']+cutout_sz)))
    ax.imshow(xdf_image[slc], norm=norm_image)
    ax.text(2, 2, str(src['id']), color='w', va='top')
    ax.set_xticks([])
    ax.set_yticks([])

@Onoddil
Copy link

Onoddil commented Sep 20, 2018

So I don't think the issue can be the elliptical nature of the Gaussian as, without specifying, DAOStarFinder has ratio=1.0 as its default value, assuming circular Gaussians. If I run IRAFFinder with

iraffind = IRAFStarFinder(fwhm=5.0, threshold=20.*std, minsep_fwhm=0.0, sharplo=0.2,
sharphi=1.0, roundlo=-1.0, roundhi=1.0, sky=0.0)
sources_iraf = iraffind(xdf_image * ~xdf_image.mask)
print(sources_iraf)

I get:

id xcentroid ... flux mag
---- ------------------ ... ------------------- ---------------------
1 2509.734474225375 ... 0.15605459964717738 2.0168085653827426
... ... ... ... ...
1415 2514.8275136598045 ... 0.17862686840817332 1.870133038817817
Length = 1415 rows

bringing me into ~96% agreement with the DAOStarFinder. From a quick look at the reason seems to be minsep_fwhm, as if I remove that and run

iraffind = IRAFStarFinder(fwhm=5.0, threshold=20.*std, sharplo=0.2,
sharphi=1.0, roundlo=-1.0, roundhi=1.0, sky=0.0)
sources_iraf = iraffind(xdf_image * ~xdf_image.mask)
print(sources_iraf)

I get

id xcentroid ... flux mag
---- ------------------ ... ------------------- ---------------------
1 2509.734474225375 ... 0.15605459964717738 2.0168085653827426
... ... ... ... ...
1027 2514.8275136598045 ... 0.17862686840817332 1.870133038817817
Length = 1027 rows

which seems to agree with @eteq on the 30% difference. I therefore think the issue the default requirement that objects have to be ~13 pixels separated, which is just not going to be the case in the Hubble EXtremely Deep Field!

Re: @eteq 's 3 vs 5 pixel FWHM: if you decrease the FWHM IRAFStarFinder gains sources (likely missed stars), but DAOStarFinder loses sources (likely dropped galaxies larger than the new, smaller FWHM). However, for this test case I'm not sure that matters; we should simply go with the FWHM that gives a good selection of all sources, and perhaps briefly mention the tuning of the parameters to the science being done if that isn't already mentioned.

Just to throw a slight spanner in the self-similarity works if you run

IRAFStarFinder(fwhm=5.0, minsep_fwhm=0.0, threshold=20.*std)

(i.e., the IRAF default sharpness etc., just without the minimum distance requirement) you only get 280 sources, so at that point the large differences between default roundness and sharpness are likely dropping galaxies, as daofind allows roundness to be in the domain [-1, 1] but iraffind limits roundness to [0, 0.2] and iraffind requiring a lower sharpness of 0.5 to daofind's 0.2.

In summary, for this comment is a bit disjointed by this point: for a self-similar comparison you can set IRAFStarFinder to the same settings as DAOStarFinder, but require minsep_fwhm = 0.0, or you can justify the differences with a combination of the 13 pixel separation requirement and stricter limits on sharpness and roundness (where roundness is likely the galaxy/star divide issue), and consider a FWHM that gives the best balance between stars and galaxies for both finders.

@laurenmarietta
Copy link
Contributor Author

After what feels like a year, I have finally had the time to take another look at this notebook.

I've made just a couple changes - thanks, Erik, for that clever code to show cutouts of the sources. And thanks to both of you for your sleuthing regarding the differences between IRAFStarFinder and DAOStarFinder! I added a section in the notebook to address that specifically, which I am hopeful seems instructive and not just nit-picky.

If either of you spot anything else that could use clarification or improvement, please let me know.

@Onoddil
Copy link

Onoddil commented Dec 29, 2018

Managed to have a read of this notebook, but haven't got iPython to play nicely so I've not managed to verify the plot outputs (I'll have another look at these notebooks once I'm back in the US). The flow and words etc. look good though; I really just have one minor comment.

I'm not sure if it was deliberate or a slight formatting issue, but any of the "inline" comments (e.g., https://github.com/eteq/csi-stsci-notebooks/pull/3/files#diff-fbc70e2f8789c7a80f84a34d56d11e00R39 or any of the Exercises boxes) expose some formatting, such as asterisks for bolding, ` marks around `astropy`, URLs, etc. I assume these were supposed to format within their inline boxes, but haven't quite come out right. If it was a deliberate choice, however, then feel free to ignore this, but it looks slightly "incomplete" to me.

Copy link
Owner

@eteq eteq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@laurenmarietta - Overall this looks really good! I saw several minor things inline, but actually I think this is almost there.

"cell_type": "markdown",
"metadata": {},
"source": [
"Let's also define some `matplotlib` parameters, such as title font size and the dpi, to make sure our plots look nice. To make it quick, we'll do this by loading a [shared style file](photutils_notebook_style.mplstyle) into `pyplot`. We will use this style file for all the notebook tutorials. (See [here](https://matplotlib.org/users/customizing.html) to learn more about customizing `matplotlib`.)"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe:

Suggested change
"Let's also define some `matplotlib` parameters, such as title font size and the dpi, to make sure our plots look nice. To make it quick, we'll do this by loading a [shared style file](photutils_notebook_style.mplstyle) into `pyplot`. We will use this style file for all the notebook tutorials. (See [here](https://matplotlib.org/users/customizing.html) to learn more about customizing `matplotlib`.)"
"Let's also define some `matplotlib` parameters, such as title font size and the dpi, to make sure our plots look nice. To make it quick, we'll do this by loading a [style file shared with the other photutils tutorials](photutils_notebook_style.mplstyle) into `pyplot`. We will use this style file for all the notebook tutorials. (See [here](https://matplotlib.org/users/customizing.html) to learn more about customizing `matplotlib`.)"

"source": [
"As described in the introduction, we will be using Hubble eXtreme Deep Field (XDF) data. Since this file is too large to store on GitHub, we will just use `astropy` to directly download the file from the STScI archive: https://archive.stsci.edu/prepds/xdf/ \n",
"\n",
"(Generally, the best package for web queries of astronomical data is `astroquery`; however, the dataset we are using is a High Level Science Product (HLSP) and thus is not located within a catalog that could be queried with `astroquery`.)"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps also have the "astroquery" text be a link?

"metadata": {},
"outputs": [],
"source": [
"unit = u.ct / u.s\n",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"unit = u.ct / u.s\n",
"unit = u.electron / u.s\n",

It might be clearer to leave it as counts, but since electron is a unit here, you might want to do that?

(There's a subtlty here: "ADUs" and "electrons" are generally related via the gain, but "counts" I think are not actually clearly defined to be one or the other. So it's probably better to specify electrons specifically, but maybe I'm mis-understanding what "counts" means here?)

"cbar.ax.set_yticklabels(labels)\n",
"\n",
"# Define labels\n",
"cbar.set_label(r'Flux Count Rate ($e^{-1}/s$)', rotation=270, labelpad=30)\n",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be neat to use the unit itself to render the latex. That is, you could do something like:

Suggested change
"cbar.set_label(r'Flux Count Rate ($e^{-1}/s$)', rotation=270, labelpad=30)\n",
"cbar.set_label(r'Flux Count Rate (${}$)'.format(xdf_image.unit.to_string('latex')), rotation=270, labelpad=30)\n",

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(same comment applies below in several places)

"cell_type": "markdown",
"metadata": {},
"source": [
"Take this example as reminder to be critical when selecting a source detection algorithm, and when defining algorithm parameters!"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this wording. Is it "critical" as in "pay attention to the details"? If so, maybe a better option would be "reminder to be careful" or even "examine your results and play with the parameters" or something like that?

"cell_type": "markdown",
"metadata": {},
"source": [
"For more general source detection cases that do not require rigorous comparison with models, `photutils` offers the `find_peaks` function. \n",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(minor):

Suggested change
"For more general source detection cases that do not require rigorous comparison with models, `photutils` offers the `find_peaks` function. \n",
"For more general source detection cases that do not require comparison with models, `photutils` offers the `find_peaks` function. \n",

"cell_type": "markdown",
"metadata": {},
"source": [
"Next, how many of these sources match? We can answer this question by using [sets](https://docs.python.org/2/library/sets.html) to compare the centroids of the different sources (rounding to the first decimal place)."
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to link to the python 2 version of the docs. I think you want just the version-less one?: https://docs.python.org/library/stdtypes.html#set-types-set-frozenset

"cell_type": "markdown",
"metadata": {},
"source": [
"*Remember that you can double-click on the plot to zoom in and look around!*"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused by this. Is that actually true in general for most browsers? (I haven't had a chance to check this in the live notebook, but will check later to be sure)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works for me. You double click to "zoom in", the page doesn't increase in width, but the figure becomes larger within its box (although the height increases a bit), and then you can scroll sideways (OS X + magic mouse and trackpad at least; not sure how this interacts with a Windows desktop though) to see the full figure.

"source": [
"## Conclusions\n",
"\n",
"The `photutils` package provides users with a variety of methods for detecting sources in their data, from familar algorithms such as `DAOFind` and `starfind`, to more complex and customizable image segmentation algorithms. These methods allow for easy creation of a diverse array of apertures that can be used for photometric analysis.\n",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there should also be a "and if you want to write your own algorithm, you can do that too!" at the end here? One or both of these could be mentioned:

Copy link

@Onoddil Onoddil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This notebook looks great overall, @laurenmarietta. As with @eteq mostly minor comments, primarily focussed on slightly weird formatting errors I get on my machine or deprecation warnings and code changes to make them go away, but a few small improvements I think will help to make the notebook even better.

"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-warning\">**Important:** Before proceeding, please be sure to update your versions of `astropy`, `matplotlib`, and `photutils`, or this notebook may not work properly. Or, if you don't want to handle packages individually, you can always use (and keep updated!) the [AstroConda](https://astroconda.readthedocs.io) distribution.</div>\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On my Jupyter notebook, the inline formatting didn't show (I saw the full URL link, and `Astropy`, etc.) without new lines and empty lines before and after the div breaks. However if this is just me and my slightly odd notebook setup then please feel free to ignore this issue (but let me know so I can see about fixing it)

Suggested change
"<div class=\"alert alert-block alert-warning\">**Important:** Before proceeding, please be sure to update your versions of `astropy`, `matplotlib`, and `photutils`, or this notebook may not work properly. Or, if you don't want to handle packages individually, you can always use (and keep updated!) the [AstroConda](https://astroconda.readthedocs.io) distribution.</div>\n",
"<div class=\"alert alert-warning\">\n",
" \n",
"**Important:** Before proceeding, please be sure to update your versions of `astropy`, `matplotlib`, and `photutils`, or this notebook may not work properly. Or, if you don't want to handle packages individually, you can always use (and keep updated!) the [AstroConda](https://astroconda.readthedocs.io) distribution.\n",
"\n",
"</div>"

"cell_type": "markdown",
"metadata": {},
"source": [
"As explained in the [previous notebook](01_photutils_background_estimation.ipynb) on background estimation, it is important to **mask** these data, as a large portion of the values are equal to zero. We will mask out the non-data, so all of those pixels that have a value of zero don't interfere with our statistics and analyses of the data. "
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maybe this could just do with a slight clarification to repeat why we're masking the dataset, highlighting that it's part of the image that we don't care about that are gone, instead of just data that happen to have a value of zero:

Suggested change
"As explained in the [previous notebook](01_photutils_background_estimation.ipynb) on background estimation, it is important to **mask** these data, as a large portion of the values are equal to zero. We will mask out the non-data, so all of those pixels that have a value of zero don't interfere with our statistics and analyses of the data. "
"As explained in the [previous notebook](01_photutils_background_estimation.ipynb) on background estimation, it is important to **mask** these data, as a large portion of the values are equal to zero. We will mask out the non-data portions of the image, so all of those pixels that have a value of zero don't interfere with our statistics and analyses of the data. "

"metadata": {},
"outputs": [],
"source": [
"mean, median, std = sigma_clipped_stats(xdf_image.data, sigma=3.0, iters=5, mask=xdf_image.mask)"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get an AstropyDeprecationWarning now, as iters is being removed in the future and maxiters should be used instead.

Suggested change
"mean, median, std = sigma_clipped_stats(xdf_image.data, sigma=3.0, iters=5, mask=xdf_image.mask)"
"mean, median, std = sigma_clipped_stats(xdf_image.data, sigma=3.0, maxiters=5, mask=xdf_image.mask)"

"# Plot the data\n",
"norm_image = ImageNormalize(vmin=1e-4, vmax=5e-2, stretch=LogStretch())\n",
"xdf_image_clipped = np.clip(xdf_image, 1e-4, None) # clip to plot with logarithmic stretch\n",
"fitsplot = ax1.imshow(np.ma.masked_where(xdf_image.mask, xdf_image_clipped), norm=norm_image)\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not particularly relevant to the quality of the notebook but I was a little bit put off by the large swathes of yellow the figure produces by default, but I couldn't find a nice way to set cmap = plt.get_cmap('viridis'); cmap.set_bad(color='k') (i.e., use the masked data from xdf_image to set the plot to black) that interacted with ImageNormalize.

It would be great if there was a way to get the masked data set to black while still keeping the rest of the figure the same, however; but this is a really minor thing so don't worry too much if you also can't figure out how to make that happen. This comment applies anywhere an imshow exists, as I suspect it happens a lot and it would be nice if the plots could be a little bit less "in your face" with the colour scheme...

"cell_type": "markdown",
"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\">\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, I ran into some issues with this block where the lack of line breaks revealed the underlying formatting, instead of correctly displaying the formatted data (sorry about the odd formatting of this comment, it seems you can't suggest changes across multiple lines...). Note that I also am unable to get the ** bolding to work within a <h3> block, but if that's just me again, please ignore this:

"<div class=\"alert alert-block alert-info\">\n",
"<h3>**Exercises:**</h3><br>\n",
"Re-run the `DAOStarFinder` algorithm with a smaller threshold (like 5&sigma;), and plot the sources that it finds. Do the same, but with a larger threshold (like 100&sigma;). How did changing the threshold affect the results?\n",
"</div>"

Suggested change
"<div class=\"alert alert-block alert-info\">\n",
"<div class=\"alert alert-block alert-info\">\n",
"\n",
"<h3>Exercises:</h3>\n",
"\n",
"Re-run the `DAOStarFinder` algorithm with a smaller threshold (like 5&sigma;), and plot the sources that it finds. Do the same, but with a larger threshold (like 100&sigma;). How did changing the threshold affect the results?\n",
"\n",
"</div>"

"# Create a segmentation image\n",
"segm = detect_sources(xdf_image.data, threshold, npixels)\n",
"\n",
"print('Found {} sources'.format(segm.max))"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got an AstropyDeprecationWarning again:

Suggested change
"print('Found {} sources'.format(segm.max))"
"print('Found {} sources'.format(segm.max_label))"

"source": [
"<div class=\"alert alert-block alert-info\">\n",
"<h3>**Exercises:**</h3><br>\n",
"Recompute the `SegmentationImage`, but alter the threshold and the minimum number of pixels in a source. How does changing the threshold affect the results? What about changing the number of pixels?\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, for this to render properly for me the ** and the <br> need removing, and line returns need adding after each line

Suggested change
"Recompute the `SegmentationImage`, but alter the threshold and the minimum number of pixels in a source. How does changing the threshold affect the results? What about changing the number of pixels?\n",
"<div class=\"alert alert-block alert-info\">\n",
" \n",
"<h3>Exercises:</h3>\n",
"\n",
"Recompute the `SegmentationImage`, but alter the threshold and the minimum number of pixels in a source. How does changing the threshold affect the results? What about changing the number of pixels?\n",
"\n",
"</div>"

"catalog = source_properties(xdf_image.data, segm)\n",
"table = catalog.to_table()\n",
"print(table)\n",
"print(table.colnames)"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this section could do with a little bit of context about what we've calculated and some small examples of what the user can do with them? It feels a little bit "floaty" currently, just a "hey neat we can do this thing" side note but doesn't give the bigger picture context that makes the rest of the notebook work really well.

"metadata": {},
"source": [
"<div class=\"alert alert-block alert-info\">\n",
"<h3>**Exercises:**</h3><br>\n",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Final instance of "I can't render this as currently formatted" I promise! Please add line breaks, remove the ** and the <br>, etc.

"The `photutils` package provides users with a variety of methods for detecting sources in their data, from familar algorithms such as `DAOFind` and `starfind`, to more complex and customizable image segmentation algorithms. These methods allow for easy creation of a diverse array of apertures that can be used for photometric analysis.\n",
"\n",
"\n",
"** To continue with this `photutils` tutorial, go on to the [aperture photometry](03_photutils_aperture_photometry.ipynb) or [PSF photometry notebook](04_photutils_psf_photometry.ipynb).**"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The space between the ** and To here causes this to fail to render as bold for me. If the choice was to show ** at each end of the sentence then ignore me, however.

@laurenmarietta
Copy link
Contributor Author

Thanks to you both for these wonderful reviews :)

First things first, per the new plan for finishing up these notebooks, I am actually going to close this PR - any future discussion will take place on the spacetelescope/notebooks repo, on the new PR thread: spacetelescope/notebooks#101

I appreciated the very helpful feedback - also so many fun tips and shortcuts, I learn every time I get another review. I did implement all your suggestions, with the following comments:

  • Thanks @Onoddil for bringing the colormap problem to my attention - I think I must have been using an old version of matplotlib or something, so I wasn't seeing that awful yellow background. I was ultimately able to find an easy fix (by setting clip=False within the ImageNormalize instantiation!)
  • You also bring up a good point when it comes to rounding the centroids to evaluate source matching. I had picked rounding to the tenths place as a completely arbitrary choice. I am happy to just increase the rounding from the tenths place to the ones place if that’s okay with you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants