From 9595ccc6ae71f6b0640801c45689782f66c83a02 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Brigitta=20Sip=C5=91cz?= Date: Thu, 7 Dec 2023 13:06:35 -0800 Subject: [PATCH 1/5] CI: using new notebooks names in README for rendering --- index.md | 24 ++++++++++++------------ 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/index.md b/index.md index e796c45..8c7d8ca 100644 --- a/index.md +++ b/index.md @@ -7,18 +7,18 @@ maxdepth: 2 --- -CS_Catalog_Queries -CS_Image_Access -CS_Spectral_Access -CS_UCDs -CS_VO_Tables -Exercise_I -Exercise_II -Exercise_III -QuickReference -UseCase_I -UseCase_II -UseCase_III +content/reference_notebooks/catalog_queries +content/reference_notebooks/image_access +content/reference_notebooks/spectral_access +content/reference_notebooks/ucds_unified_content_descriptors +content/reference_notebooks/votables +content/use_case_notebooks/candidate_list_exercise +content/use_case_notebooks/proposal_prep_exercise +content/use_case_notebooks/hr_diagram_exercise +content/reference_notebooks/basic_reference +content/use_case_notebooks/candidate_list_solution +content/use_case_notebooks/proposal_prep_solution +content/use_case_notebooks/hr_diagram_solution ``` From 83f50e2776abfc602c69004230a2cef31e1c070a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Brigitta=20Sip=C5=91cz?= Date: Thu, 7 Dec 2023 13:27:15 -0800 Subject: [PATCH 2/5] CI: dealing with md files being moved to non-flat directory structure --- tox.ini | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/tox.ini b/tox.ini index a342b6d..9679fc0 100644 --- a/tox.ini +++ b/tox.ini @@ -31,7 +31,8 @@ deps = commands = pip freeze - !buildhtml: jupytext --from myst --to notebook *.md + !buildhtml: bash -c 'find content -name "*.md" | grep -vf ignore_testing | xargs jupytext --to notebook ' + !buildhtml: pytest --nbval buildhtml: sphinx-build -b html . _build/html -D nb_execution_mode=auto -nWT --keep-going From a2436a04036781e4191bab5ea203992bf32f5cc0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Brigitta=20Sip=C5=91cz?= Date: Thu, 7 Dec 2023 13:56:30 -0800 Subject: [PATCH 3/5] DOC: fix internal reference to other notebook --- .../reference_notebooks/catalog_queries.md | 96 +++++++++---------- 1 file changed, 48 insertions(+), 48 deletions(-) diff --git a/content/reference_notebooks/catalog_queries.md b/content/reference_notebooks/catalog_queries.md index b27f5d6..c1ab71a 100644 --- a/content/reference_notebooks/catalog_queries.md +++ b/content/reference_notebooks/catalog_queries.md @@ -46,9 +46,9 @@ widgets: There are two ways to access astronomical data catalogs that are provided as table data with a VO API. -First, there is a __[Simple Cone Search (SCS) protocol](http://www.ivoa.net/documents/latest/ConeSearch.html)__ used to search a given table with a given position and radius, getting back a table of results. The interface requires only a position and search radius. +First, there is a __[Simple Cone Search (SCS) protocol](http://www.ivoa.net/documents/latest/ConeSearch.html)__ used to search a given table with a given position and radius, getting back a table of results. The interface requires only a position and search radius. -For more complicated searches, the __[Table Access Protocol](http://www.ivoa.net/documents/TAP/)__ (TAP) protocol is a powerful tool to search any VO table. Here, we expand on its usage and that of the __[Astronomical Data Query Language](http://www.ivoa.net/documents/latest/ADQL.html)__ (ADQL) that it uses. +For more complicated searches, the __[Table Access Protocol](http://www.ivoa.net/documents/TAP/)__ (TAP) protocol is a powerful tool to search any VO table. Here, we expand on its usage and that of the __[Astronomical Data Query Language](http://www.ivoa.net/documents/latest/ADQL.html)__ (ADQL) that it uses. - [Accessing astronomical catalogs](#accessing-astronomical-catalogs) - [1. Simple cone search](#1-simple-cone-search) @@ -94,9 +94,9 @@ coord = SkyCoord.from_name("m51") print(coord) ``` -Below, we go through the exercise of how we can figure out the most relevant table. But for now, let's assume that we know that we want the CFA redshift catalog refered to as 'zcat'. VO services are listed in a central Registry that can be searched through a [web interface](http://vao.stsci.edu/keyword-search/) or using PyVO's `regsearch`. We use the registry to find the corresponding cone service and then submit our cone search. +Below, we go through the exercise of how we can figure out the most relevant table. But for now, let's assume that we know that we want the CFA redshift catalog refered to as 'zcat'. VO services are listed in a central Registry that can be searched through a [web interface](http://vao.stsci.edu/keyword-search/) or using PyVO's `regsearch`. We use the registry to find the corresponding cone service and then submit our cone search. -Registry services are of the following type: +Registry services are of the following type: * simple cone search: "scs" * table access protocol: "tap" or "table" * simple image search: "sia" or "image" @@ -112,11 +112,11 @@ services.to_table()['ivoid', 'short_name', 'res_title'] Supposing that we want the table with the short_name CFAZ, and we want to retrieve the data for all sources within an arcminute of our specified location: ```{code-cell} ipython3 -## Use the one that's CFAZ. +## Use the one that's CFAZ. ## Use list comprehension to check each service's short_name attribute and use the first. -cfaz_cone_service = [s for s in services if 'CFAZ' in s.short_name][0] +cfaz_cone_service = [s for s in services if 'CFAZ' in s.short_name][0] -## We are searching for sources within 10 arcminutes of M51. +## We are searching for sources within 10 arcminutes of M51. results = cfaz_cone_service.search(pos=coord, radius=10*u.arcmin) results.to_table() ``` @@ -133,13 +133,13 @@ A TAP query is the most powerful way to search a catalog. A Simple Cone Search o ### 2.1 TAP services -Many services list a single TAP service in the Registry that can access many catalogs, boosting your efficiency, and letting you add constraints based on any column. This is the power of the TAP! +Many services list a single TAP service in the Registry that can access many catalogs, boosting your efficiency, and letting you add constraints based on any column. This is the power of the TAP! Suppose for our example, we want to select bright galaxy candidates but don't know the coordinates. Therefore, we start from figuring out the best table to query. +++ -As before, we use the `vo.regsearch()` for a servicetype 'tap'. There are a lot of TAP services in the registry, but they are listed slightly differently than cone services. The metadata on each catalog is usually published in the registry with its cone service, and then the full TAP service is listed as an "auxiliary" service. So to find a TAP service for a given catalog, we need to add the option *includeaux=True*. Alternatively, you can start with a single TAP service and then ask it specifically which tables it serves, but for this use case, that is less efficient. +As before, we use the `vo.regsearch()` for a servicetype 'tap'. There are a lot of TAP services in the registry, but they are listed slightly differently than cone services. The metadata on each catalog is usually published in the registry with its cone service, and then the full TAP service is listed as an "auxiliary" service. So to find a TAP service for a given catalog, we need to add the option *includeaux=True*. Alternatively, you can start with a single TAP service and then ask it specifically which tables it serves, but for this use case, that is less efficient. We'll first do a registry search for all auxiliary TAP services related to the "CfA" and "redshift". @@ -156,7 +156,7 @@ for t in tap_services: print(f"{t.ivoid}: {t.res_description}\n") ``` -From the above information, you can choose the table you want and then use the specified TAP service to query it as described below. +From the above information, you can choose the table you want and then use the specified TAP service to query it as described below. But first we'll look at the other way of finding tables to query with TAP: by starting with the TAP services listed individually in the Registry. We see above that the HEASARC has the ZCAT, so what else does it have? @@ -168,20 +168,20 @@ You can find out which tables a TAP serves and then look at the tables descripti # Here, we're looking for a specific service, and we don't need the includeaux option: tap_services = vo.regsearch(servicetype='tap',keywords=['heasarc']) heasarc = tap_services[0] -heasarc_tables=heasarc.service.tables +heasarc_tables=heasarc.service.tables ``` Then let's look for tables matching the terms we're interested in as above. ```{code-cell} ipython3 for tablename in heasarc_tables.keys(): - if "redshift" in heasarc_tables[tablename].description.lower(): + if "redshift" in heasarc_tables[tablename].description.lower(): heasarc_tables[tablename].describe() print("Columns={}".format(sorted([k.name for k in heasarc_tables[tablename].columns ]))) print("----") ``` -There are a number of tables that appear to be useful table for our goal, including the ZCAT, which contains columns with the information that we need to select a sample of the brightest nearby spiral galaxy candidates. +There are a number of tables that appear to be useful table for our goal, including the ZCAT, which contains columns with the information that we need to select a sample of the brightest nearby spiral galaxy candidates. Now that we know all the possible column information in the zcat catalog, we can do more than query on position (as in a cone search) but also on any other column (e.g., redshift, bmag, morph_type). The query has to be expressed in a language called __[ADQL](http://www.ivoa.net/documents/latest/ADQL.html)__. @@ -193,62 +193,62 @@ Now that we know all the possible column information in the zcat catalog, we can The basics of ADQL: -* *SELECT * FROM my.interesting.catalog as cat...* +* *SELECT * FROM my.interesting.catalog as cat...* -says you want all ("*") columns from the catalog called "my.interesting.catalog", which you will refer to in the rest of the query by the more compact name of "cat". +says you want all ("*") columns from the catalog called "my.interesting.catalog", which you will refer to in the rest of the query by the more compact name of "cat". -Instead of returning all columns, you can +Instead of returning all columns, you can -* *SELECT cat.RA, cat.DEC, cat.bmag from catalog as cat...* +* *SELECT cat.RA, cat.DEC, cat.bmag from catalog as cat...* to only return the columns you're interested in. To use multiple catalogs, your query could start, e.g., -* *SELECT c1.RA,c1.DEC,c2.BMAG FROM catalog1 as c1 natural join catalog2 as c2...* +* *SELECT c1.RA,c1.DEC,c2.BMAG FROM catalog1 as c1 natural join catalog2 as c2...* says that you want to query two catalogs zipped together the "natural" way, i.e., by looking for a common column. -To select only some rows of the catalog based on the value in a column, you can add: +To select only some rows of the catalog based on the value in a column, you can add: -* *WHERE cat.bmag < 14* +* *WHERE cat.bmag < 14* says that you want to retrieve only those entries in the catalog whose bmag column has a value less than 14. -You can also append +You can also append -* *ORDER by cat.bmag* +* *ORDER by cat.bmag* -to return the result sorted ascending by one of the columns, adding *DESC* to the end for descending. +to return the result sorted ascending by one of the columns, adding *DESC* to the end for descending. A few special functions in the ADQL allow you to query regions: * *WHERE contains( point('ICRS', cat.ra, cat.dec), circle('ICRS', 210.5, -6.5, 0.5))=1* -is how you would ask for any catalog entries whose RA,DEC lie within a circular region defined by RA,DEC 210.5,-6.5 and a radius of 0.5 (all in degrees). The 'ICRS' specifies the coordinate system. +is how you would ask for any catalog entries whose RA,DEC lie within a circular region defined by RA,DEC 210.5,-6.5 and a radius of 0.5 (all in degrees). The 'ICRS' specifies the coordinate system. See the ADQL documentation for more. +++ -### 2.3 A use case +### 2.3 A use case Here is a simple ADQL query where we print out the relevant columns for the bright (Bmag <14) sources found within 1 degree of M51 (we will discuss how to define the table and column names below): ```{code-cell} ipython3 ## Inside the format call, the {} are replaced by the given variables in order. -## So this asks for -## rows of public.zcat where that row's ra and dec (cat.ra and cat.dec from the catalog) -## are within radius 1deg of the given RA and DEC we got above for M51 -## (coord.ra.deg and coord.dec.deg from our variables defined above), and where -## the bmag column is less than 14. -query = """SELECT ra, dec, Radial_Velocity, radial_velocity_error, bmag, morph_type FROM public.zcat as cat where +## So this asks for +## rows of public.zcat where that row's ra and dec (cat.ra and cat.dec from the catalog) +## are within radius 1deg of the given RA and DEC we got above for M51 +## (coord.ra.deg and coord.dec.deg from our variables defined above), and where +## the bmag column is less than 14. +query = """SELECT ra, dec, Radial_Velocity, radial_velocity_error, bmag, morph_type FROM public.zcat as cat where contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{},{},1.0))=1 and cat.bmag < 14 - order by cat.radial_velocity_error + order by cat.radial_velocity_error """.format(coord.ra.deg, coord.dec.deg) ``` ```{code-cell} ipython3 -results=heasarc.service.run_async(query) +results=heasarc.service.run_async(query) #results = heasarc.search(query) results.to_table() ``` @@ -260,9 +260,9 @@ See the __[information on the zcat](https://heasarc.gsfc.nasa.gov/W3Browse/galax Therefore, we can generalize the query above to complete our exercise and select the brightest (bmag < 14), nearby (radial velocity < 3000), spiral ( morph_type = 1 - 9) galaxies as follows: ```{code-cell} ipython3 -query = """SELECT ra, dec, Radial_Velocity, radial_velocity_error, bmag, morph_type FROM public.zcat as cat where - cat.bmag < 14 and cat.morph_type between 1 and 9 and cat.Radial_Velocity < 3000 - order by cat.Radial_velocity +query = """SELECT ra, dec, Radial_Velocity, radial_velocity_error, bmag, morph_type FROM public.zcat as cat where + cat.bmag < 14 and cat.morph_type between 1 and 9 and cat.Radial_Velocity < 3000 + order by cat.Radial_velocity """.format(coord.ra.deg, coord.dec.deg) ``` @@ -297,9 +297,9 @@ result.to_table() ### 3.1 Cross-correlating to combine catalogs -TAP can also be a powerful way to collect a lot of useful information from existing catalogs in one quick step. For this exercise, we will start with a list of sources, uploaded from our own table, and do a 'cross-correlation' with the *zcat* table. +TAP can also be a powerful way to collect a lot of useful information from existing catalogs in one quick step. For this exercise, we will start with a list of sources, uploaded from our own table, and do a 'cross-correlation' with the *zcat* table. -For more on creating and working with VO tables, see that [notebook](CS_VO_Tables.md). Here, we just read one in that's already prepared: +For more on creating and working with VO tables, see that [notebook](votables.md). Here, we just read one in that's already prepared: First, check that this service can handle uploaded tables. Not all do. @@ -312,7 +312,7 @@ The inline method is what PyVO will use. These take a while, i.e. half a minute ```{code-cell} ipython3 query=""" SELECT cat.ra, cat.dec, Radial_Velocity, bmag, morph_type - FROM zcat cat, tap_upload.mysources mt + FROM zcat cat, tap_upload.mysources mt WHERE contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',mt.ra,mt.dec,0.01))=1 and Radial_Velocity > 0 @@ -329,7 +329,7 @@ Therefore we now have the Bmag, morphological type and radial velocities for all ### 3.2 Cross-correlating with user-defined columns -Our input list of sources contains galaxy pair candidates that may be interacting with each other. Therefore it would be interesting to know what the morphological type and the Bmagnitude are for the potential companions. +Our input list of sources contains galaxy pair candidates that may be interacting with each other. Therefore it would be interesting to know what the morphological type and the Bmagnitude are for the potential companions. In this advanced example, we want our search to be physically motivated since the criterion for galaxy interaction depends on the physical separation of the galaxies. Unlike the previous case, the search radius is not a constant, but varies for each candidate by the distance to the source. Specifically, we want to search for companions that are within 50 kpc of the candidate and therefore first need to find the angular diameter distance that corresponds to galaxy's distance (in our case the radial velocity). @@ -338,8 +338,8 @@ Therefore, we begin by taking our table of objects and adding an angDdeg column: ```{code-cell} ipython3 ## The column 'radial_velocity' is c*z but doesn't include the unit; it is km/s ## Get the speed of light from astropy.constants and express in km/s -c = const.c.to(u.km/u.s).value -redshifts = mytable['radial_velocity']/c +c = const.c.to(u.km/u.s).value +redshifts = mytable['radial_velocity']/c mytable['redshift'] = redshifts physdist = 0.05*u.Mpc # 50 kpc physical distance @@ -354,13 +354,13 @@ Now we construct and run a query that uses the new angDdeg column in every row s This time, rather than write the table to disk, we'll keep it in memory and give Tap.query() a "file-like" object using io.BytesIO(). This can take half a minute: ```{code-cell} ipython3 -## In memory only, use an IO stream. +## In memory only, use an IO stream. vot_obj=io.BytesIO() apvot.writeto(apvot.from_table(mytable),vot_obj) ## (Reset the "file-like" object to the beginning.) vot_obj.seek(0) -query="""SELECT mt.ra, mt.dec, cat.ra, cat.dec, cat.Radial_Velocity, cat.morph_type, cat.bmag - FROM zcat cat, tap_upload.mytable mt +query="""SELECT mt.ra, mt.dec, cat.ra, cat.dec, cat.Radial_Velocity, cat.morph_type, cat.bmag + FROM zcat cat, tap_upload.mytable mt WHERE contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',mt.ra,mt.dec,mt.angDdeg))=1 and cat.Radial_Velocity > 0 and cat.radial_velocity != mt.radial_velocity @@ -378,11 +378,11 @@ Therefore, by adding new information to our original data table, we could cross- ## 4. Synchronous versus asynchronous queries -There is one technical detail about TAP queries that you will need to know. In the code cells above, there are two commands for sending the query, one of which is commented out. This is because, with the TAP, there are two ways to send such queries. The default when you use the `search()` method is to us a synchronous query, which means that the query is sent and the client waits for the response. For large and complicated queries, this may time out, or you may want to run several in parallel. So there are other options. +There is one technical detail about TAP queries that you will need to know. In the code cells above, there are two commands for sending the query, one of which is commented out. This is because, with the TAP, there are two ways to send such queries. The default when you use the `search()` method is to us a synchronous query, which means that the query is sent and the client waits for the response. For large and complicated queries, this may time out, or you may want to run several in parallel. So there are other options. -The method `service.run_async()` uses an asynchronous query, which means that the query is sent, and then (under the hood without you needing to do anything), the method checks for a response. From your point of view, these methods look the same; PyVO is doing different things under the hood, but the method will not return until it has your result. +The method `service.run_async()` uses an asynchronous query, which means that the query is sent, and then (under the hood without you needing to do anything), the method checks for a response. From your point of view, these methods look the same; PyVO is doing different things under the hood, but the method will not return until it has your result. -You need to know about these two methods for a couple of reasons. First, some services will limit synchronous queries, i.e. they will not necessarily return *all* the results if there are too many of them. An asynchronous query should have no such restrictions. In the case of the HEASARC service that we use above, it does not matter, but you should be aware of this and be in the habit of using the asynchronous queries for complete results after an initial interactive exploration. +You need to know about these two methods for a couple of reasons. First, some services will limit synchronous queries, i.e. they will not necessarily return *all* the results if there are too many of them. An asynchronous query should have no such restrictions. In the case of the HEASARC service that we use above, it does not matter, but you should be aware of this and be in the habit of using the asynchronous queries for complete results after an initial interactive exploration. The second reason to be aware of this is that asynchronous queries may be queued by the service, and they can take a lot longer if the service is very busy or the job is very large. (The synchronous option in this case may either time out, or it may return quickly but with incomplete results.) From d9adbcd78ca1280615d6853f7a9c1870f31aa904 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Brigitta=20Sip=C5=91cz?= Date: Fri, 8 Dec 2023 19:38:10 -0800 Subject: [PATCH 4/5] CI: fixing tox commands --- tox.ini | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/tox.ini b/tox.ini index 9679fc0..76523a5 100644 --- a/tox.ini +++ b/tox.ini @@ -29,11 +29,13 @@ deps = # Temporary fix for lexer errors ipython!=8.7.0 +allowlist_externals = bash + commands = pip freeze - !buildhtml: bash -c 'find content -name "*.md" | grep -vf ignore_testing | xargs jupytext --to notebook ' + !buildhtml: bash -c 'find content -name "*.md" | xargs jupytext --to notebook ' - !buildhtml: pytest --nbval + !buildhtml: pytest --nbval content/ buildhtml: sphinx-build -b html . _build/html -D nb_execution_mode=auto -nWT --keep-going pip_pre = From ab4cc5cf9e25ad44d76f52b41e7ba5a9de529e3d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Brigitta=20Sip=C5=91cz?= Date: Fri, 8 Dec 2023 19:39:12 -0800 Subject: [PATCH 5/5] MAINT: don't add generated ipynb files to repo --- .gitignore | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/.gitignore b/.gitignore index 6a0c347..ed0b518 100644 --- a/.gitignore +++ b/.gitignore @@ -72,6 +72,9 @@ target/ # Jupyter Notebook .ipynb_checkpoints +# Content generated during notebook execution +*ipynb + # pyenv .python-version @@ -105,4 +108,3 @@ venv.bak/ # nbcollection output _build/ -