Blog Plan

Below is the list of all hacks that were worked on during Astro Hack Week 2014, exported from the hackpad. Each one will become a blog post, somehow. If your hack is missing, or the link to your hackpad is broken, please fix it! Preferred MO is fork this repo, check in the edit, submit a pull request. And check out the "Home Runs" section for the blog posts that we have already published!

On Deck

Next Blogger, date: title (linked to hackpad etc)
Next Blogger, date: title (linked to hackpad etc)
Next Blogger, date: title (linked to hackpad etc)

Home Runs

Jake Vanderplas, 2014-09-20: Multi-output Random Forests, source code (ipynb) here

Hacks Still To Be Blogged

An Astrophysicist's Interactive Resume (Caroline Sofiatti) - Create an interactive resume (something like this) that tells a kid-friendly, high-level story about my research and my lab by using images and data visualization. I think this is a good step towards getting kids, and in particular girls, more excited about science.
Cosmology Upgrade (Beth Reid) - Add standard cosmology analyses of N-body data to an existing package, either yt or PyNbody, with the latter having some UW contributors (?). I have in mind power spectrum, correlation function of halo mass bins or HOD galaxy catalogs. I may also abandon this idea and work on someone else's project :)
A Gaussian Process Toolbox for Black Hole Systems (Daniela Huppenkothen) - Black hole systems show many types of temporal variability, from intrinsic stochastic variability to periodicities to deterministic processes like flares and dips. I want to build a new toolbox based on Gaussian Processes to model these various phenomena; the main focus are X-ray observations of Ultra-Luminous X-ray Sources (ULXs), but this should be a starting point and be extensible to other relevant source classes such as black-hole binaries and AGN.
Side science with HETDEX (Michael Gully-Santiago) - Data mining HETDEX for side-science will be great, once it comes online in ~1 year. A synthetic data set exists, which could be adapted to gauge possibilities, influence data collection strategies, and lay the ground work for a proposal.
Brown Dwarf Classification (Michael Gully-Santiago) - Head to head comparison of machine learning to conventional classification for Brown Dwarfs. I'm curious to see how machine learning compares against reasonable color selection criteria in a ~12 x 54,000 catalog. The targets are candidate young brown dwarfs, with a classified sub sample of ~400 targets.
Replot the Textbook (M. Gully-Santiago) - Hack the Statistics, Data Mining, and Machine Learning in Astronomy textbook figures. Every figure in the text book is a downloadable Python file. Digging into the source code is a fantastic way to hone one's skills. The end product could be a website with modified/enhanced figures to serve as tutorials for the community. This is also a great way to practice with MPLD3 and D3.js.
Merger Tree Shapes as a function of mass and environment (Lauren Anderson) - In collaboration with the CS department we created MyMergerTree, a service that builds and visualizes merger histories of simulations using a parallel database. I want to build a tool on top of this to quantify/describe merger tree shapes and differences for halos of various masses and environments. The goal being to apply it to our currently running simulation that is a 100 TB dataset.
Extreme Deconvolution Eats Bananas (Phil Marshall) - how can we infer underlying population PDFs when the individual object models are non-linear, with likelihoods characterised with MCMC samples instead of analytic Gaussians?
Representing overdensities in astro data (Karen Ng), aka clustering / doing kernel density estimation . Astronomy data usually show clustering on some scale due to gravity. This project tries to put together / modify some code (maybe Scikit-learn + astroML code) for identifying clusters of stars / galaxies and finding the corresponding density / luminosity peaks in real data. Furthermore, it would be helpful to put together some kernel density estimation tools for estimating bandwidths of accurate number density contours to compare against the clustering results.
Faster Parallel Import of Python (Yu Feng) - On a distributed supercomputer, the start up of Python can be substantial because the python standard-library and other libraries (numpy, scipy, pandas, astropy, yt) consists a lot of small files, choking the shared file-system. There are some existing attempts to get around this. I've collected some at http://github.com/rainwoodman/MPI_Import/. We can play around with more ideas. Some understanding of MPI and super computers are required. It is relatively simple and more on the infrastructure side, thus I may abandon this to join other's hack (eg, to work on Beth's Cosmology Upgrade)
Multi Pixel Hierarchical SED Modelling (Jonathan Sick) — What's the best way to model the stellar population of a galaxy if a) you have lots of pixels, each with six-band photometry, and b) photometric measurements are heavily biased by sky subtraction uncertainties? I think the answer to this is a hierarchical bayesian model sampled with MCMC. I've done the single pixel case with a Python-based sedbot/emcee/python-fsps toolchain, so this week I'll be calling on fellow hackers to help me figure out how write down a proper hierarchical model, and implement a parallelized likelihood function with MPI in emcee.
Combining Galaxy IFU and Multiband Photometry Data (Peter Yoachim) - Lot's of people do really fancy SED fitting (like Jonathan up there^), and lots of people (like me) fit galaxy spectra. Now that we have lots of galaxy IFU surveys (VENGA, MaNGA, SAMI) and lots of multiband imaging archives (SDSS, 2MASS, GALEX) we need a way to properly combine spatially resolved spectra with photometry when fitting star formation histories.
Benchmarking Samplers (Rahul Biswas) - One of the things many of us have to do is sample a probability distribution (typically posteriors of different datasets. Several methods have been implemented and available, but (at least I) find it hard to find a single place to enumerate the advantages/disadvantages of these methods in one place. I would like to have a few simple distributions with different qualitative features (long and narrow, multi-modal, very noisy, etc.) coded up to study the relative benefits of such easily available methods. It could be for example, that some methods are fast , but don't work well on the tails ... so knowing where they work well in a quantified way (how well do they do against the truth), and speed (how many likelihood evals, ability to spread on processors etc.) would be useful.
Checking the thermodynamic integration evidence calculation in emcee (Phil Marshall) - Evidence calculation is hard, thermodynamic integration during parallel tempering sampling is one way to do it, emcee has an implementation of this, how well does it work? We want to test against an analytic density (a Gaussian mixture likelihood with uniform prior) and plot the cooling schedule as we go to see how well the integral is approximated.
Inferring the true velocity distribution (Adrian Price-Whelan, Jeff Andrews [not here], Hogg) - Given observations of v*sin(i) --- the velocity times sine of some unknown inclination angle --- can we infer something about the distribution, f(v)?
Finding & Modeling Flares in Kepler Light Curves - (Jim Davenport) - I have a catalog of known flares for one particularly active star in Kepler. I would like to fit these events with a flare model I have developed, and work on better flare detection from light curves.
Interactive Data Exploration Workflows (Cathy Petry) - I am designing a workflow consisting of a set of tools that enable coders and non-coders alike to explore a dataset of ~3M rows in a visual way. There is a set of "pre-prescribed" views that could be the starting point, and then from there, outliers or other features in the data could be explored by drilling down (or over) in the dataset. All data is in one MySQL or SQLite database, and data-blending of multiple sources is not a requirement. A desirable feature would be to explore more than one database (SQLite DB file) together to compare the same statistic in each DB.
Cell macros / snippets in the IPython Notebook (Adrian Price-Whelan) - In making new notebooks, I find that I often end up copying the same few imports into the top cell(s) of the file. I thought it would be awesome to have some way to automagically insert pre-defined "cell macros" into an existing notebook. This is something that might end up in the core code base eventually, but in the mean time I've made a little hack that adds some custom JavaScript to do the trick. The hack adds a new drop-down menu and button to the IPython toolbar that lists user-defined cell macros that can be inserted easily into the current notebook. https://github.com/adrn/macro-cell
Outlier stars in kinematics-predicted chemical tags (Adrian Price-Whelan) - In a few years, Gaia will release a huge catalog (100s of millions) of stars with 6D phase-space information + chemistry ([Fe/H] and [α/Fe]). Stars in the Galaxy should form fairly distinct clusters in combined action + chemistry-space ... more later....
**Time Series Forecasting with Random Forest **(Jim Davenport) - trying to use random forest regression to do time series forecasting... using the worst possible hacky bullshit. Win! Check out the notebook here! (github here).
Classical Statistics in Python (Daniela Huppenkothen and Fernando Perez) - we're building an interactive ipython notebook to clarify key concepts in classical statistics
Hacking XDGMM to do what i want (Karen Ng) - I want to use XD as a classifier, and report to me which object is likely to have been drawn from which underlying population component. The astroML implementation of XD doesn't offer this, while the sklearn one does - so astroML needs an upgrade!
cubehelix upgrades (Jim Davenport) - Updated some documentation for my implementation of cubehelix in python, found bug... not sure if in ipython notebook or cubehelix. Example here should look like this, and in fact it DOES work when I just run it from the command line w/ ipython... not sure where to go from here.
Spectro-Perfectionism (Nell Byler) -
Documenting MAF and learning about ipython notebook (and doing a solid for Phil at the same time). (Lynne Jones).
Hack the MAF web UI to present some plots more compactly (Lynne Jones). Messing around with web UI stuff .. basically, taking some plots on one page, making a subselection of them, and then putting them on another page in a different order.
Color-Magnitude Contour Plot using SNCosmo simulated data (Caroline Sofiatti and others) - We made a color-magnitude contour plot of what a mix of Type Ia, Type Ibc and Type II SNe should look like for a given redshift, phase and set of filters! Seaborn FTW!!!
Hacked Ethnographic Fieldnotes (Brittany Fiore-Silfvast) - Initial reflections on the community of astro data hackers. Comments or any further reflections are welcomed via Github issues!
Molecular Cloud Classification (Lori Beerman) - Classifying molecular clouds as either star-forming or non star-forming can shed light into how much time clouds spend in the various life stages. I have high resolution data of M31’s molecular cloud population, along with ancillary data sets of various star formation indicators. I would like to classify each cloud in my sample as currently star-forming or non star-forming.
Radio Source Detection and Classification (Patti Carroll) - Unsupervised and Semi-supervised clustering for source detection, association, and classification (real or not real) in a systematic and independent way. The data is from the Murchison Widefield Array Epoch of Reionization observations.
Zeljko's XD Example (Yusra Alsayyad) - Zeljko illustrated XD in his Friday lecture. Here's a simple extension.
Detecting objects in satellite/aerial imagery (Amit Kapadia)
```
*   First try over a pine forest [](http://nbviewer.ipython.org/gist/kapadia/6fa2af1c55151c4b030c)http://nbviewer.ipython.org/gist/kapadia/6fa2af1c55151c4b030c
```
- Second attempt with little training and RF: https://gist.github.com/kapadia/908a0dc539e1725163bf
- Third try over Aleppo using various KMeans attempts: http://nbviewer.ipython.org/gist/kapadia/76d43f29e745250c4571
K2 photometry (Daniel Foreman-Mackey) - Doing PSF photometry for the second version of the Kepler satellite. Code: https://github.com/dfm/kpsf, using the C++ optimization library Ceres

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PLAN.md

PLAN.md

Blog Plan

On Deck

Home Runs

Hacks Still To Be Blogged

Files

PLAN.md

Latest commit

History

PLAN.md

File metadata and controls

Blog Plan

On Deck

Home Runs

Hacks Still To Be Blogged